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IBSTRiCT 


The  nulti-hackend  database  system  (HBOS)  in  the 
laboratory  for  Database  System  Research  at  the  Naval 
Postgraduate  School  is  designed  to  overcome  the 
performance-gain  and  capacity-growth  problems  of  either  the 
traditional  database  system  or  the  single-backend-software 
database  system.  The  original  HBDS  supported  four  primary 
operations/  namely,  RETRIEVE,  DELETE,  UP 'ATE  and  INSERT. 

This  thesis  presents  the  design  and  implementation  of 
the  fifth  primary  operation,  the  RETRIEVE-COMMON  operation. 
The  retrieve-common  operation  is  used  to  merge  two  files  by 
their  common  attribute  values.  First,  the  overall  design 
and  iflplementation  of  MBDS  is  reviewed.  Then,  several 
alternatives  are  compared  and  analyzed  to  select  the  best 
one  as  our  design  and  implementation  approach.  Finally,  we 
describe  the  detailed  design  and  the  implementation.  Our 
goal  is  to  maximize  the  utilization  and  minimize  the  effects 
to  the  existing  system. 

For  integrating  our  design  into  MBDS,  several 
modifications  are  made.  The  algorithms  for  the 
modifications  and  their  program  specifications  are  also 
provided  in  Chapter  IV,  V  and  Appendices. 
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I.  IHTEODDCTIOH 


A.  TEE  SCOPE  OF  THE  THESIS 

A  database,  is  a  collection  of  stored  operational  data; 
and  a  database  system  is  a  computer-based  system  whose 
overall  purpose  is  to  record  and  maintain  information  (data) 
[Ref,  1],  The  traditional  approach  to  manage  the  database 
system  is  to  run  the  database  system  software  as  an 
application  program  in  a  mainframe  computer  system.  The 
database  system  must  share  the  use  and  the  control  of  the 
mainframe  computer  resources  with  all  of  the  other 
applications  of  the  computer  system.  The  performance  of 
this  approach  suffers  whenever  there  is  an  increase  from 
either  the  usage  of  the  computer  system  or  the  database 
applications. 

One  solution  to  this  problem  is  to  offload  the  database 
system  from  the  mainframe  to  a  single,  dedicated  bacXend 
computer.  The  backend  computer  has  its  own  disk  storage  and 
used  to  perform  database  operations  exclusively. 
[Refs,  2,3].  This  approach  is  known  as  the  single  software 
backend  approach.  Latabase  systems  based  on  this  approach 
are  referred  to  as  software  single  backend  database  systems. 
However,  this  approach  still  has  the  disadvantage,  that  is, 
performance  upgrades  will  require  the  replacement  of  the 
backend  and  this  may  entail  software  modifications  and 
hardware  disruption  [Ref.  4  :  p.  4]. 

A  second  approach  to  solve  the  database  performance 
problem  is  to  develop  a  special-purpose  database  machine 
with  specially  designed  hardware.  However,  the 
cost-effectiveness  of  this  approach,  known  as  the  hardware 
backend  approach,  has  not  yet  been  demonstrated  [Ref.  5]. 
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In  order  to  overcome  the  perrormance-gain  and 
capacity-growth  problems  of  t=ither  the  traditional  database 
system  or  the  single  tackend  software  system,  a  research  of 
a  multi-backend  database  system,  known  as  MBDS,  is  conducted 
in  the  Laboratory  for  Database  Systems  Pesearch,  at  the 
Naval  Postgraduate  School.  Instead  of  a  single  backend 
computer,  MODS  uses  several  identical  (both  in  hardware  and 
in  software)  minicomputers  as  its  backend  computers  in  a 
parallel  fashion  in  order  to  gain  performance  gain  and 
capacity  growth.  These  backends  with  their  respective  disk 
systems  are  connected  with  another  minicomputer,  called  the 
backend  ccntroller.  The  controller  is  responsible  for 
supervising  ti  e  execution  of  parallel  database  operations  on 
the  backea  ,s  and  for  interfacing  with  the  hosts  and  the 
user.  Users  access  the  system  either  by  way  of  the  host  or 
through  the  controller  directly  (as  shown  in  Figure  1.1). 


Figure  1.1  The  Hulti-BacKend  Database  System  (HBOS). 
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The  attribute-based  data  language  (ABDL)  [Ref.  6]  is 
used  as  the  basis  of  the  data  language  of  H6DS.  Currently, 
ABDl  supports  four  primary  database  operations,  RETRIEVE, 
DELETE,  aPDATE  and  INSERT.  The  functions  of  these  four 
database  operations  are  shown  in  Figure  1.2. 


j  Operation 

Function  | 

I  RETRIEVE 

Retrieve  records  from  the  database  | 

j  DELETE 

Delete  records  from  the  database  | 

j  OPIATE 

Modify  records  of  the  database  j 

j  INSERT 

Insert  records  into  the  database  j 

Figure  1.2  The  Functions  of  the 
Current  BBDS  Database  Operations. 


In  order  to  make  MBDS  a  more  complete  database  system, 
the  fifth  operation,  the  RETRIEVE-COMMON  operation  which  is 
used  to  merge  two  files  by  common  attribute  values,  has  been 
proposed  [Ref.  7].  This  thesis  will  focus  on  the  design  and 
implementation  of  the  RETRIEVE-COMMON  operation  of  MBDS.  We 
will  propose  several  alternatives  of  the  design  and 
implementation  strategies,  then  evaluate  and  analyze  these 
alternatives  based  on  the  time  complexities,  the  affects  to 
the  existing  system  and  the  design-goals  of  MBDS.  According 
the  results  of  the  analysis,  we  will  choose  the  best 
alternative  to  design  and  implement  the  fifth  operation. 


A  ’• 


THE  OBGAHIZEIIOE  OF  THE  THESIS 


The  rest  of  this  thesis  is  organized  as  follows.  In 
chapter  II  we  give  an  overview  of  the  architecture  of  the 
MBDS.  We  will  describe  the  design  goals,  the  underlying  and 
intended  hardware,  the  process  structure,  the  data  model  and 
the  data  language  of  MBDS.  In  chapter  III,  we  first 
define  the  intended  operation  and  the  syntax  of 
BETBI£VE_COHHON  operation,  and  then  evaluate  and  analyze  the 
alternatives  for  the  design  and  implementation.  According 
to  the  analysis,  we  will  select  the  best  alternative  to  add 
the  retrieve-common  operation  into  the  MBDS.  In  chapter  IV, 
we  present  the  details  of  the  design  for  the  selected 
approach.  We  also  consider  the  possible  effects  of  this 
approach  to  the  existing  system.  In  chapter  V,  we  describe 
how  to  incorporate  our  design  into  HBOS.  Our  goal  is  to 
minimize  the  effects  of  the  implementation.  Finally,  this 
thesis  is  summarized  and  concluded  in  chapter  VI.  It  is 
hoped  that  this  thesis  will  provide  a  definite  help  to  the 
future  work  on  HBOS. 


U-  II£  flamriASSIIS  B4IAIASS  (files) 


In  this  chapter  ve  will  briefly  review  the  configuration 
and  the  theory  of  operations  of  the  MBDS.  Most  of  the 
inforiation  provided  in  this  chapter  has  been  extracted  froa 
[Hefs.  4«7  :  pp.  1-68#  7-20].  The  interested  readers  are 
encouraged  to  refer  to  the  references. 

A.  THE  SISTBB  GOALS 

As  mentioned  in  chapter  1,  HBDS  is  designed  to  overcome 
the  performance  prcblems  and  upgrade  issues  of  the 
traditional  mainframe-based  or  the  software  single-backend 
database  system.  In  ether  words,  the  overall  goal  for  HBOS 
is  to  prove  that: 

(1)  the  system  is  easily  extensible;  and 

(2)  the  performance  gain  and  improvement  should  be 

proportional  to  the  multiplicity  of  processing  and 
storage  elements  [Ref.  4  :  pp.1-5]. 

In  order  to  achieve  the  aforementioned  goal,  the  design 
requirements  and  their  correlated  design  issues  for 
designing  and  implementing  HBDS  have  been  defined  in  [Ref.  7 
:  pp.  7-10]. 

1 .  Design  Requirements 

There  are  three  main  design  requirements  for  HBDS. 

(1)  The  system  must  be  expandable. 

(2)  Both  the  hardware  and  software  are  generic. 

(3)  The  database  is  evenly  distributed  across  the  disk 
systems  of  the  backends,  and,  for  operation,  there  are 
parallel  and  concurrent  processing  of  transactions  by 
the  backends. 


The  first  two  design  reguireoents  can  support  the 
addition  of  backends  for  performance  enhancement  and 
capacity  growth  by  adding  new  backends  of  the  same  type  and 
by  using  existing  system  software.  With  the  third 
requirement,  performance  gain  (in  terms  of  response-time 
reduction)  and  capacity  growth  (in  terms  of  response-time 
invariance)  of  the  system  are  likely  to  be  in  proportion  to 
the  number  of  backends  of  the  system. 

2 .  Design  Issues 

Ther-  are  several  issues  which  must  be  resolved  in 
order  to  mee^  the  design  requirements  of  HDDS.  The  first 
issue  concerns  the  backend  controller.  As  shown  in  Figure 
1.1,  the  controller  may  become  a  primary  bottleneck  of  the 
system.  In  order  to  avoid  this  problem,  the  functions  of 
the  controller  should  be  minimized  and  reduced  to  the 
pre-processing  of  the  user  transactions,  the  post- processing 
of  the  transaction  results,  the  sending  and  receiving  data 
between  the  backends  and  the  host,  and  the  arbitration  of 
data  insertion  into  the  database. 

The  second  design  issue  addresses  the 
characteristics  and  functionality  of  the  communication  bus 
between  the  controller  and  the  backends.  The  bus  should  be 
cost-effective  and  efficient  for  both  backend  communication 
and  backend  addition. 

The  third  class  of  issues  involves  the  backends  of 
the  system.  The  backends  must  have  identical  software  to 
allow  replication  of  the  software  on  a  new  backend. 
Additionally,  the  backends  must  have  complete  software  to 
perform  all  of  the  database  management  functions.  These 
functions  include  directory  management,  concurrency  control, 
record  processing  and  communication. 

The  fourth  design  issue  concerns  the  database.  The 
database  should  be  evenly  distributed  across  all  the  disk 
systems  of  the  backends. 


The  fifth  design  issue  is  on  the  choice  of  a  data 
model  and  data  language.  The  data  model  should  easily 
support  the  required  data  distribution  and  the  data 
placement  of  the  database.  The  data  language  for  the  system 
is  of  course  based  on  the  chosen  data  model.  It  must 
capture  all  of  the  primary  operations  of  the  database 
system.  The  chosen  data  model  is  the  attribute-based  data 
model  and  the  data  language  is  the  attribute-based  data 
language. 

The  sixth  design  issue  focuses  on  minimizing  the 
communications  traffic  of  the  system.  The  controller  should 
only  communicate  with  the  backends  for  sending  the 
pre-processed  user  transaction,  for  arbitrating  the  data 
placement,  and  for  receiving  results.  The  backends  should 
only  ccmmunicate  with  the  controller  for  sending  the  results 
of  the  user  transactions.  Communication  among  backends 
should  be  held  to  a  sinimum. 

The  seventh  issue  deals  with  the  directory  placement 
strategies.  In  order  to  enable  each  backend  to  perform  all 
the  database  management  functions  and  minimize  the 
communication  among  backends,  the  directory  data  are 
duplicated  at  each  backend. 

B.  T6I  DHDEBLIIBG  ABC  INTENDED  BASDWABE 

An  overview  of  KEDS  hardware  organization  is  shown  in 
Figure  2.1  User  access  is  accomplished  through  a  host 
computer  which  in  turn  communicates  with  the  controller. 
When  a  transaction  (either  a  request  or  a  set  of  requests) 
is  received,  the  controller  will  broadcast  the  transaction 
to  all  the  backends.  Since  the  data  of  all  data  files  are 
evenly  distributed  across  all  the  backends,  all  backends  can 
now  execute  the  same  request  in  parallel.  A  queue  of 
requests  is  maintained  in  each  backend.  When  a  backend 
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finishes  executing  one  reguest  it  will  send  the  results  of 
that  request  to  the  controller  and  be  able  to  start 
executing  the  next  request  independent  to  the  other  backend. 

Originally,  HBDS  is  designed  to  be  configured  with  a 
nunbei  of  microprocessor-based  processing  units  and  their 
disk  subsystems  and  be  connected  by  a  broadcast- based 
communications  line.  When  the  implementation  of  HBDS  began, 
neither  the  microprocessor-based  computers  nor  the 
broadcast-based  communications  devices  were  available.  Ihe 
present  HBOS  is  configured  with  a  VAX- 11/780  (VHS  OS)  as 
both  the  host  and  the  controller  and  two  PDP-11/44s  (RSX-IlH 
OS)  and  their  disk  systems  as  the  backends.  Communication 
between  computers  is  accomplished  by 
time-division-fflultiplexed  buses,  knowns  as  parallel 
communication  links  (PCLs) .  The  broadcasting  bus  is 
simulated  by  the  PCI. 

Currently,  HBOS  is  being  down-loaded  to  an  initial 
configuration  of  eight  microprocessor-based, 
broadcast-bus-connected,  and  Hinchester-drive-supported 
workstations,  with  cne  of  the  eight  being  used  as  the 
controller  and  the  others  as  the  backends.  This  workstation 
(Sun-2/170,  4.2  BSD  UNIX  OS)  has  the  Hotorola  MC68010  as  the 
CPO  with  16  mbytes  of  virtual  space  per  process  and  uses 
Ethernet  as  the  broadcast  bus  among  workstations.  The  disk 
drives  on  the  backends  are  Fujitsu  Eagle  Winchester-type 
drives,  with  a  formated  capacity  of  380  mbytes  per  drive. 

C.  THE  DITA  lODEL  AID  THE  DATA  LAH60AGE 

In  this  section  we  will  first  introduce  the  concept  and 
terminology  of  the  attribute- based  data  model  which  is  the 
data  model  used  in  HBCS,  then  describe  the  data  language  in 
which  users  may  issue  request  to  HBDS. 


1.  The  Attribute-based  Data  Model 

MBDS  chooses  the  attribute-based  data  model  to  be 
its  data  model.  In  the  attribute-based  data  model,  data  is 
modeled  with  the  ccnstructs:  database,  file,  record, 

attribute-value  pair  (keyword),  directory  keyword, 

directory,  record  bcdy,  keyword  predicate,  and  query. 
Informally,  a  database  is  a  collection  of  files,  each  file 
contains  a  groups  of  records  which  are  characterized  by  a 
unique  set  of  directory  keywords.  A  record  is  composed  of 
two  parts.  The  first  part  is  a  collection  of 
attribute-value  pairs  or  keywords.  An  at tribute- value  pair 
is  a  member  of  the  Cartesian  product  of  the  attribute  name 
and  the  value  domain  of  the  attribute.  As  an  example, 
<SALABY,  30000>  is  an  attribute-value  pair  having  30000  as 
the  value  for  the  attribute  SALARY.  All  the  attributes  in  a 
records  are  required  to  be  distinct.  Certain 

attribute-value  pairs  of  a  record  (or  a  file)  are  called  the 
directory  keyword  of  that  record  (file)  ,  because  either  the 
attribute-value  pairs  or  the  ranges  of  their  attribute 
values  are  kept  in  the  directory  for  addressing  the  record 
(file).  The  rest  of  the  record  is  textual  information  which 
is  referred  to  as  the  recor d  bp^y. 

The  angle  brackets,  <,  >,  enclose  an  attribute-value 
pair.  The  curly  brackets,  {,  ),  include  the  record  tody. 

The  parenthesis,  (,  ) ,  form  a  record.  The  first 

attribute-value  of  all  records  of  a  file  is  the  same.  In 
particular,  the  attribute  is  FILE  and  the  value  is  the  file 
name.  An  example  cf  a  record  of  employee  file  is  shown 
below : 

(<FILE,  Employee>,  <JCB,  Sgr>,  <DEPT,Toy>,  <SALARY,  30000> 

(Employee  Description} ) 


The  record  has  four  keywords  and  a  record  body  of  employee 
description. 

A  keyword  predicate,  or  simply  predicate,  is  of  the 

form 


(attribute/  relational  opera tor /  value) . 

Without  confusion/  we  also  use  parenthesis  to  enclose  a 
predicate.  A  relati cnal  operator  can  be  one  of  (  =,  t=,  <, 
=</  >=) .  For  example/  (SALARY  >  20000)  is  a  predicate.  A 
keyword  K  is  said  to  satisfy  a  predicate  T  if  the  attribute 
of  K  is  identical  to  the  attribute  in  T  and  the  relation 
specified  by  the  relational  operator  of  T  holds  between  the 
value  of  K  and  the  value  in  T.  For  example/  the  keyword 
<5ALABT/  30090>  satisfies  the  predicate  (SALARY  >  20000). 

h  query  consists  of  several  keyword  predicates  in 
disjunctive  normal  form.  An  example  of  a  ^uery  is: 

(  (DEPT=Toy)  and  ( (SALARYOOOOO)  or  (SALARY>2  0000) ) )  . 

2.  T)^e  Attribut6~based  £at^  Language 

The  data  manipulation  language  for  HBDS/  the 
attribute-based  data  language  (ABOL)  is  a  non-procedural 
language  which  originally  supports  four  primary  database 
operations:  RETRIEVE/  INSERT/  DELETE  and  UPDATE.  It  is  the 
purpose  of  this  thesis  to  design  and  implement  the  fifth 
primary  database  operation/  the  RETRIEVE-COHMOH  operation. 

The  RETRIEVE  request  is  used  to  retrieve  records  of 
the  database.  The  syntax  of  a  RETRIEVE  request  is  shown  as 
below : 

RETRIEVE  Query  (Target- List)  [BY  Attribute]  [WITH  Pointer] 

The  query  specifies  which  records  are  to  be  retrieved.  The 
target- list  is  a  list  of  output  attributes.  It  may  also 
consist  of  an  aggregate  operators  on  one  or  more  output 


attritutes.  MBDS  supports  five  aggregation  operators,  they 
are:  AVG,  COONT,  son,  NIN  and  MAX.  The  BY-clause  and  the 

WITH-clause  are  optional.  The  BY-clause  may  be  used  to  group 
records  when  an  aggregate  operation  is  specified.  The 
WITH-clause  may  be  used  to  specify  whether  pointers  to  the 
retrieved  records  must  be  returned  to  the  user  or  user 
program  for  later  use  in  an  update  reguest.  Some  examples  of 
retrieve  reguest  are  shown  in  below. 

Example  1.  Retrieve  the  names  of  all  employees  who  work  in 
the  Toy  department. 

RETRIEVE  {  (FILE=Em:  ’oyee)  and  {DEPT=Toy) )  (NAME) 

Example  2.  List  the  average  salary  of  all  departments. 
RETRIEVE  (FILE=Employee)  (AVG (SALARY) )  BY  DEPT. 

The  INSERT  reguest  is  used  to  insert  a  record  into 
the  database.  The  syntax  of  as  INSERT  reguest  is: 

INSERT  Record 

The  following  example  will  insert  a  record  into  the  Employee 
file. 

INSERT  (<FILE,Employee>,  <SALAR Y, 300 00> ,  <DEPT,  Toy>) 

The  syntax  of  a  DELETE  reguest  is: 

DELETE  Query 

where  the  guery  specifies  the  record (s)  to  be  removed  from 
the  database.  The  following  example  will  delete  records  from 
the  Enplcyee  file. 

DELETE  (  {FILE=Employe€)  and  (S .>LARY  =  30000)  and  (DEPT=Toy)). 


Ihe  UPDATE  request  is  used  to  aodify  records  of  the 
datatase.  The  syntax  cf  the  OPOATE  request  is: 

OPEATE  Query  <Sodifier> 

where  the  query  specifies  the  particular  records  to  be 
updated  froo  the  database  and  the  oodifier  specifies  the 
kinds  of  modification  that  need  to  be  done  on  records  that 
satisfy  the  query.  The  following  example  will  give  a  $1000 
raise  to  all  employees. 

UPDATE  (FILE=Enploy€e)  <SALARY=SALARI+10 00> 

The  SETRIETE-COMMON  request  is  used  to  merge  two 
files  by  common  attributes.  It  will  be  detailly  discussed 
in  the  later  chapters. 

D.  TEE  PROCESS  STROCTORE 

HBDS  is  a  message**oriented  system.  In  a 

message-oriented  system,  each  process  corresponds  to  one 
system  function.  These  processes  communicate  among 
themselves  by  passing  messages.  The  processes  are  created  at 
system  start  time  and  exist  until  the  system  is  stopped. 
Figure  2.2  provides  an  overview  of  HDDS  process  structure. 


1.  T^e 


Processes 


Communication  between  computers  in  NBDS  is  achieved 
by  using  the  PCI.  NEDS  provides  a  software  abstraction  to 
this  bus  for  each  computer  in  order  to  emulate  broadcast 
capabilities.  The  abstraction  consists  of  two  complimentary 
processes.  The  first  process,  get-pcl,  gets  message  from 
other  computers  off  the  PCL.  The  second  process,  put-pel, 
puts  messages  on  the  bus  to  be  broadcasted  to  other 
computers.  Every  computer,  whether  it  is  the  controller  or  a 
backend,  has  its  own  get-pcl  and  put-pel. 
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There  are  31  message  types  and  one  general  message 
format  used  in  the  HBDS  message-passing  facilities.  The 
format  (shown  in  Fig  are  2.3)  is  used  for  each  of  the  three 
message-passing  facilities,  namely,  messages  within  the 
contrcller,  messages  within  the  backends,  and  messages 
between  computers. 


1  A  Message 

Data  Type  j 

Message  Type 

Message  Sender 

Message  Receiver 

Message  Text 

a  numeric  code 

a  numeric  code 

a  numeric  code 

an  alphanumeric  field  terminated 
by  an  end  of  message  marker 

Figure  2.3  The  General  Format  of  HBDS  Messages. 

Messages  between  computers  are  divided  into  two  classes: 
messages  between  backends  and  messages  between  the 
contrcller  and  the  backends.  Figure  2.4  describes  each  of 
HBDS  message  types. 

2.  Thg  Test  Interface  Process 

The  test  interface  process  allows  the  user  to 
interact  with  the  HBES  directly.  Since  HBDS  does  not  use  a 
host  computer,  the  test  interface  process  is  contained  in 
the  controller. 

3.  Tlje  Processes  of  the  Controller 

In  addition  to  the  communications  and  test-interface 
processes,  the  controller  consists  of  three  additional 
processes:  Request  Preparation  (RP) ,  Insert  information 
Generation  (IIG)  and  Post  processing  (PP).  RP  receives. 


parses  and  formates  a  request  (transaction)  before  sending 
the  formated  request  (transaction)  to  the 
directory-management  process  in  each  backend.  IIG  is  used 
to  provide  additional  information  to  the  backends  when  an 
insert  request  is  received.  PP  is  used  to  collect  all  the 
results  cf  a  request  (transaction)  and  forward  the  results 
to  the  user. 

**  •  Processes  of  Each  Back  end 

In  addition  to  the  ccmmunication  processes,  each 
tackend  also  consists  of  three  other  processes:  Record 
Processing  (RP) ,  Directory  Hanagement  (DM)  and  Concurrency 
Control  (CC) . 

DM  controls  the  execution  of  a  request  at  a  backend 
and  accesses  the  secondary-storage-based  directory  tables. 
It  determines  the  disk  addresses  where  the  relevant  data  of 
a  particular  request  are  stored  and  then  sends  those  disk 
addresses  to  RP. 

CC  is  used  to  insure  the  consistency  of  the  database 
while  allowing  concurrent  execution  of  multiple  requests. 

RP  performs  the  disk  I/O  operations  and  other 
operations  specified  by  the  request.  It  receives  the 
secondary-addresses  from  DM,  which  processes  the  request. 
The  results  are  then  forwarded  to  the  controller. 
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Fiyure  2.H  The  HBDS  Message  Types. 
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Ill-  fiiSIfig  glB  ANALYSIS  OP  2iE  RETRIEYE-COMHON  BEODEST 


In  this  chapter,  we  introduce  the  terainology  and 
notations  of  the  "Retrieve-ComiDon”  request,  investigate  and 
analyze  several  possible  design  and  implementation 
approaches,  and  then  select  the  best  one  to  design  and 
implement  the  Retrieve-Common  operation  for  IIBDS.  Ihe 

selection  of  an  approach  is  based  on  the  design 

requirements  and  the  design  issues  of  NBDS. 

A.  THE  INTENDED  OPERATION 

1-  An  Operation  Cn  Two  Files 

The  RBTRIEVI-COnHON  request  is  used  to  merge  two 
files  by  common  attribute  values.  The  common  attribute 
values  are  the  attribute  values  which  belong  to  the  records 
of  both  files.  For  example,  suppose  there  are  two  files: 
file  A  and  file  B.  File  A  contains  the  records  of  the 

street  names  of  San  dose  city: 

(<FILE,  A>,  <STEEET,  i10HTEEEy>,  <CITY,  SAN  JOSE>' 
(<FILB,  A>,  <STEEET,  SECOND>,  <CITY,  SAN  JOSE>) 


File  E  consists  the  records  of  city  names  of  the  Monterey 
county: 


(<FILE,  B>,  <CITy,  MONTEEEY>,  <COONTI,  MONTEEEY>) 
(<FILE,  B>,  <CITY,  SEASIDE>,  <COaNTY,  MONTEEEY>) 


The  RETHIEVE-COMMON  request  can  provide  us  a  third  file, 
say,  file  C,  with  the  information  such  as:  **111  the  records 
of  both  files  A  and  E,  where  the  street  name  of  the  records 
in  file  A  is  identical  to  the  city  name  of  the  records  in 
file  B.  One  of  the  records  in  file  C  which  satisfy  the 
request  would  be 

(<FILE,  O,  <FILE,  A>,  <STBEET,  MONTEREY>,  <CITY,  SAN  JCSE>, 
<FILE,  E>,  <CITY,  MONTEREY),  <COUNTY,  MONTBBEY>) . 

Logically,  the  retrieve-common  request  involves  two 
retrieval  operations.  Ne  define  the  first  retrieval 
operation  as  the  source  retrieve  and  the  second  retrieval 
operation  as  the  target  retrieve.  The  set  of  all  the 
records  that  belong  to  the  result  of  the  source  retrieve  is 
called  the  source  record  set.  The  set  of  all  the  records 
that  belong  to  the  result  of  the  target  retrieve  is  called 
the  target  record  set.  A  source  ft£urget>  record  is  the 
record  that  belongs  to  the  source  (target)  record  set. 
Similarly,  those  attributes  will  be  refered  as  source 
attributes  and  target  attributes.  The  merged  source  and 
target  records  are  termed  the  result  record  set.  The 
aforementioned  file  C  is  a  result  record  set. 

tie  term  the  source  and  target  attribute  names  that 
participate  in  the  retrieve-common  operation  the  join 
attribute  names  or  briefly  loin  attributes.  However,  their 
values  are  termed  c off po;)  attribute  values,  or  simply  common 
values.  The  retrieve-common  operation  requires  that  the 
join  attribute  which  is  specified  in  the  source  record  set 
must  have  the  same  dcmain  as  that  of  the  join  attribute  in 
the  target  record  set,  although  they  need  not  have  the  same 
attribute  name. 

Consider  another  example,  suppose  the  source  records 
are  characterized  by  the  attributes,  Employee_name,  Wages, 
and  the  target  records  are  characterized  by  Rank,  Wages. 


r  tier,  let  the  domain  of  the  Employee_naB€  be  the 
character  string  and  the  domain  of  both  Bank  and  Wages  be 
the  integer.  A  retrieve-common  operation  may  be  performed 
by  merging  on  the  attribute  values  of  the  wage  of  the 
respectxve  source  record  and  the  target  record.  A 
retrieve-common  operation  may  also  be  performed  by  merging 
on  the  vages  of  the  source  record  and  the  ranks  of  the 
target  record.  Since  their  value  domains  are  the  same. 
However,  a  merge  between  the  employee  names  and  the  ranks 
would  not  be  permitted,  since  their  domains  are  different. 

The  logical  operation  for  the  retrieve-common 
request  can  be  described  as  follows. 


(1) 

All  records 

collected. 

satisfying 

the 

source 

retrieve 

(2) 

All  records 

collected. 

satisfying 

the 

target 

retrieve 

(3)  The  records  of  the  two  collections  are  pairwise  merged 
on  the  common  {source  and  therefore  target)  attribute 
values. 

2.  Th^  Syntax  Qf  Betrievf> -Common  Request 

When  developing  the  syntax  of  the  retrieve-common 
request,  we  must  attempt  to  design  a  data  language  construct 
that  is  similar,  syntactically,  to  the  other  primary 
operations  of  ABDI.  In  particular,  the  syntax  of 
retrieve-common  operation  should  resemble  the  syntax  of  the 
ABDL  retrieve  operation  given  below: 

RETRIEVE  Query  (Target- list)  [BY  Attribute]  [WITH  Pointer] 

Using  the  above  syntax  as  a  guideline,  we  define  the  syntax 
for  the  retrieve-con  men  request  as  follows. 

RETRIEVE  Query- 1  (Target-list- 1)  [BY  Attribute  ][  WITH  Pointer] 
COHNOH  (Attribute- 1,  Attribiite-2) 


BETRII7E  Query-2  (Tar get- list- 2) [BI  Attribute ][ BITH  Pointer] 

Ihe  retrieve-common  request  consists  of  three  parts. 
The  first  part  is  what  we  have  referred  to  as  the  source 
retrieve  request,  which  retrieves  the  source  record  set. 
The  second  part  is  the  specification  of  the  join  attributes, 
where  Attribute-1  belongs  to  the  source  record  and 
Attribute-2  belongs  to  the  target  record.  Although  the 
values  of  these  two  attributes  must  be  the  same  in  order  to 
satisfy  the  condition  for  merging  the  respective  records, 
their  attribute  names  need  not  be  identical.  The  third  part 
is  what  has  been  refered  to  as  the  target  retrieve  request, 
which  retrieves  the  target  record  set. 

E.  AH  AHAIYSIS  OF  DIFFERENT  DESIGNS 

In  order  to  make  this  thesis  self-contained,  several 
possible  design  approaches  described  in  [Ref.  8]  are 
reviewed  in  this  section. 

The  main  issue  when  considering  alternative  strategies 
for  implementing  the  retrieve-common  request  is  where  the 
merge  of  the  source  and  the  target  records  should  be 
performed. 

There  are  three  major  alternatives  for  distributing  the 
worklcad  of  the  retrieve-common  request. 

(1)  The  controller  does  all  of  the  merge  operation. 

(2)  The  backends  do  all  of  the  merge  operation. 

(3)  The  controller  and  the  backends  share  the  worklcad  of 
the  merge. 

Each  of  these  alternatives  will  be  analyzed  and  judged  using 
the  design  requirements  and  design  issues  of  HBOS. 

In  order  to  simplify  the  analysis  of  design  (or 
implementation)  strategies,  we  make  the  following 
assumptions. 
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(1)  The  records  of  the  source  record  set  and  the  records 
of  the  target  record  set  are  distributed  evenly  across 
the  backends. 

(2)  The  operation  of  the  retrieve-comoon  is  performed  as 
described  in  the  previous  section. 

1.  The  Controller  Does  All  the  Merge  Operation 

In  this  alternative,  each  backend  only  performs 
these  two  retrieval  operations  and  then  sends  the  records  of 
source  record  set  and  records  of  the  target  record  set  to 
the  ccntroller.  Upon  receiving  all  the  source  records  and 
target  records  from  all  the  backends,  the  controller 
performs  the  merging  operation  and  sends  the  results  to  the 
host  computer. 

2.  The  Control  let  And  The  Backen  ds  Share  The  Merge 

Operation 

Each  backend  performs  the  merge  operation  over  its 
source  records  and  target  records.  The  merged  records,  along 
with  the  source  and  target  record  sets  are  then  sent  to  the 
contrcller.  The  ccntroller  performs  the  merge  operation 
over  the  source  and  target  record  sets  coming  from  different 
backends  and  then  sends  the  results  togeth  ?  with  the 
previously  merged  records  (done  by  individule  backends)  to 
the  hcst. 

3.  The  Backends  Co  All  the  Merge  Operation 

This  alternative  may  be  further  broken  into  two 
subalternatives. 

(a)  The  backends  share  the  merge  operation. 

The  backends  send  either  source  or  target  records  to 
each  other.  Iet*s  assume  that  the  target  records  are 
sent.  Each  backend  will  have  a  portion  of  the  source 
record  set  and  a  whole  set  of  target  records.  Then, 


the  backends  pexfora  the  aerge  operation  over  its  own 
source  records  and  all  of  the  target  records,  and 
sends  the  results  to  the  controller. 

(b}  One  designated  backend  performs  the  merge  operation. 
All  records  of  both  the  source  record  set  and  the 
target  record  set  are  sent  to  the  designated  backend 
frcn  all  of  the  other  backends.  The  designated 
backend  performs  the  entire  aerge  operation  and  sends 
the  results  to  the  controller. 

h.  An  Analysis  of  the  pesion  Approaches 

Four  alternatives  of  distributing  the  workload  of 
the  aerge  operation  aaong  the  controller  and  the  backends 
have  been  discussed  in  previous  subsection.  Re  now  examine 
these  alternatives  with  the  design  goals  of  NBDS. 

Alternative  1,  where  the  controller  performs  the 
entire  aerge  operation  will  increase  the  workload  of  the 
contrcller.  Recall  that  in  chapter  II  we  have  stressed  that 
in  order  to  reduce  the  chance  of  the  controller  being  the 
bottleneck  of  the  system,  we  ainiaize  the  work  of  the 
controller.  Alternative  1  violates  this  design  requirement 
Therefore,  it  will  not  be  considered  further. 

Alternative  2  will  increase  the  communications  load 
and  increase  the  workload  of  the  controller.  This 
alternative  coaplicates  the  first  and  the  sixth  design 
issues  of  HBDS.  Therefore,  it  will  also  he  eliminated  from 
the  design  consideration. 

Alternative  3a  meets  the  design  issue  of  minimizing 
the  controller  function  and  distributing  the  workload  to 
each  backend  evenly.  Alternative  3b  does  not  increase  the 
workload  of  the  controller;  nor  does  it  distribute  the 
workload  to  each  backend.  Furtheraore,  transmitting  all  the 
records  of  both  the  source  record  set  and  target  record  set 


Hill  increase  the  ccaflunxca tions  overhead.  In  addition, 
performing  the  entire  merge  operation  in  one  backfend  will 
unbalance  the  workload,  the  ret/  reducing  the  parailt-lisj  or 
the  backends,  i.e. ,  by  having  a  sin jio- backend  tc  dc  the 
merge  an  1  all  other  backends  to  idle.  This  complicates  both 
oi  the  third  and  sixth  design  issues,  so  this  alternative  is 
also  eliffinatel. 

Viith  this  analysis  we  choose  the  alternative  3a  as 
our  design  approach.  That  is,  each  backend  performs  a 
witl^  its  portion  of  source  records  and  ed.1 
target  legords.  And  then.  sends  its  result  to  the 

controller.  The  controller  forwards  the  final  result  to  the 
comtuter. 

C.  All  AIALTSIS  OF  DlfFEBENT  IHPLEHEHTAIIOIIS 

Throe  different  implementations  for  merging  the  source 
and  the  target  record  sets  ate  considered. 

(1)  A  straightforward  implementation. 

(2)  An  impiementaticn  based  on  sorting  and  matching. 

(3)  An  iaiflGmentdt icn  based  on  nucket-liashing . 

1  •  2ii®  Strai  ;ht  forward  Implement  a  tun 

The  concept  of  this  alternative  is  very  simple  and 
the  merging  operation  is  based  on  the  "nest-loop”  algorithm 
[Bef.  8  :  p.  86]  which  is  shown  in  Figure  3.1. 

This  alternative  is  accomplished  in  five  p^hasos: 

(1)  Each  backend  determines  its  own  source  records  and 
stores  them  intc  a  predefined  portion  of  the  secondary 
storage  area. 

(2)  Each  backend  deterainos  its  own  target  recoris  an! 
stores  them  into  the  t<releiii.ed  ^oction  cf  the 
secondary  storage  area. 
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PggCEDORE  Sest_loop_aerge 

£03  each  record  in  the  source  record  set  00 
FOR  each  record  in  the  target  record  set  £0 
I£  the  merging  condition  is  satisfied 
THEN 


forn  a  result  record 


END  IF 


END  FOR 


£ND  FOR 

ENjJ  PROCEDORE  Hest^loop_aer ge 


Figure  3.1  The  >est**loop  Serge  Procedure. 


(3)  Each  backend  broadcasts  its  own  local  target  records 
to  all  of  the  other  backends. 

(4)  Each  backend  receives  the  broadcasted  target  records 
froa  the  other  backends  and  stores  then  into  the 
secondary  storage  together  vith  its  ovn  target 
records. 

(5)  Each  backend  brings  its  own  source  records  and  the 
entire  target  record  set  into  the  primary  aenoryr 
performs  the  "nest-loop"  merging  operation  and  then 
send  the  merged  results  to  the  controller. 


Tfee  Inplegeqtation  Based  on  Sorting  and  Hatching 


The  idea  of  this  implementation  is  based  on  the 
following  inference. 

Since  the  retrieve-common  operation  is  simply  a  merging 
operation  on  two  files  of  records  sets,  if  we  can  have 
these  two  files  presorted  by  the  values  of  their  common 
attributes  then  the  merging  operation  may  be  efficiently 
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performed  by  matching  the  values  of  the  common 
attributes  of  the  records  of  these  two  files. 

Ihere  are  two  possible  alternatives  to  perform  the 
sort-match  algorithm. 

(a)  The  backends  do  all  of  the  sorting  and  matching 
operations. 

(b)  The  backends  and  the  controller  share  the  sorting  and 
matching  operations. 

Alternative  (fc)  vill  increase  the  workload  of  the 
controller  and  contradict  with  the  design  goals  of  HBDS,  and 
is  therefore  eliminated  from  consideration.  Only 
alternative  (a)  will  be  examined.  Alternative  (a) 

accomplishes  the  retrieve-common  operation  in  four  phases. 


(1)  Each  backend  retrieve,  sorts  and  stores  i  s  own  source 

records  and  target  records  separately,  and  then 
broadcasts  either  set  of  records  to  the  ether 
backends.  (Let's  assume  that  the  target  records  are 

transmitted.) 

(2)  Each  backend  receives  and  merges  the  incoming 
ncn-local  target  records  into  its  own  local  target 
records. 

(3)  Each  backend  performs  the  matching  operation  over  its 
own  portion  of  source  records  and  the  entire  set  of 
target  records  (from  all  the  backends)  . 

(4)  The  backends  send  the  results  to  the  controller. 

3.  T^g  Implementation  Based  og  Bucket-Hashing 

This  implementation  strategy  attempts  to  speed  up 
the  comparison  and  merge  by  hashing  records  into  small 
groups  (the  buckets  of  the  hashing  table)  which  contain 
records  with  common  attribute  values,  so  that  the  time 
complexity  of  the  merging  operation  may  be  reduced. 


A  hashing  function  applied  to  the  common  attribute 
values  is  used  to  hash  records  into  buckets.  The  bucket 
numbers  are  consecutive  integers.  Instead  of  using  primary 
and  overflow  areas,  the  buckets  use  one  or  more  fix-sized 
blocks  to  store  records.  The  numbers  of  blocks  nay  vary 
among  buckets.  Details  of  the  hashing  table,  the  buckets 
and  the  the  blocks  will  be  described  in  the  next  chapter. 

Those  source  records  and  target  records  within  the 
same  bucket  will  be  examined  and  merged  if  the  merging 
condition  is  matched.  This  alternative  can  also  be  broken 
to  two  subalternatives. 

(a)  One  common  hashing  table  is  used  for  both  source  and 
target  record  sets. 

(b)  Twc  separata  tables  are  used,  one  for  each  record  set. 

a.  One  Common  Hashing  Table 

This  alternative  is  accomplished  by  each  backend 
in  four  phases: 

(1)  All  local  source  records  will  be  hashed  and  stored 
into  blocks  according  to  their  hashed  values.  These 
blocks  (therefore  buckets)  are  termed  source  blocks 
(tuckets) . 

(2)  After  all  the  local  source  records  have  been  hashed, 
the  local  target  records  are  hashed  one  at  a  time  and 
buffered.  If  the  target  record  is  hashed  into  an 
empty  source  bucket,  then  it  is  buffered  fcr 
transmitting  to  other  backends.  Otherwise,  all  the 
records  in  the  source  bucket  will  be  retrieved  and 
merged  with  that  target  record  only  if  the  merging 
condition  is  satisfied.  The  results  are  first 
buffered  and  then  sent  to  the  controller. 

(3)  Since  the  non-local  target  records  may  arrive  at  a 
backend  while  the  backend  is  processing  some  ether 
records,  each  backend  will  place  these  incoming 
records  on  a  predefined  secondary  storage  area. 


(4)  Each  fcackend  retrieves  the  uon- local  target  records 
from  the  secondary  storage  area  and  processes  them  in 
the  same  way  as  the  the  backend  does  on  its  local 
target  records. 

k.  Separate  Hashing  Tables 

This  alternative  is  accomplished  in  three 

phases. 

(1)  The  backends  will  hash  and  store  their  own  source 
records  and  target  records  into  two  separate  hashing 
tables  by  a  commcn  hashing  function.  After  all  of  the 
tar  et  records  have  been  hashed  and  stored,  each 
backend  will  broadcast  the  hashed  results  of  their 
target  records  (i.e. ,  the  bucket  number  and  the 
records  associated  with  that  bucket  number)  to  all  of 
the  other  backends. 

(2)  Upon  receiving  all  of  the  target  information  from  the 
other  backends,  each  backend  stores  those  target 
records  into  appropriate  buckets  according  to  their 
bucket  numbers. 

(3)  The  backends  perform  the  merge  operation  on  the  local 
source  records  and  the  entire  set  of  target  records 
ani  send  the  results  to  the  controller.  The  procedure 
is  shown  in  Figure  3.2. 

1  Comparison  Cf  The  Three  Implementation  Approaches 

In  this  section  we  compare  and  analyze  these 
implementation  approaches.  Since  the  backends  work  in 
parallel,  our  analysis  only  focuses  on  how  much  time  it 
takes  for  one  backend  to  do  one  particular  strategy.  There 
are  common  operations  that  each  backend  performs,  so  that 
the  time  complexities  for  these  operations  can  be  ignored 
when  comparing  the  implementation  strategies.  The  times  of 
these  common  opera  cns  are: 


EBOCEDORE  Hashiiig_merge 

FOR  the  bucket_value  =  ■iin_value  to  iiiax_value  DO 
iF  the  buckets  of  both  tables  are  not  empty 
then 

retrieve  all  the  records  from  both  buckets 
perform  merge  operation  based  on 
the  straightforward  algorithm 

End  IF 
ESD  FOE 

END  EBOCBPa RE  Hashing_merge 


Figure  3.2  The  Hashing.merge  Procedure. 


(1)  the  time  to  process  the  records  for  the  source  request 

which  involes  determining  which  records  of  the 
database  satisfy  the  query,  projecting  the 
attribute- value  pairs  of  the  target-list  of  the 

satisfied  records  and  forming  a  source  record  set; 

(2)  the  time  to  process  the  records  for  the  target 

request,  which  involes  determining  which  records  of 
the  database  satisfy  the  query,  projecting  the 

attribute-value  pairs  of  the  target-list  of  the 

satisfied  records  and  forming  a  target  record  set; 

(3)  the  time  to  broadcast  the  local  target  records  to  the 
other  backends;  and 

(4)  the  time  to  send  the  merged  results  to  the  controller. 

The  following  notions  are  introduced  to  simply  the  ensuing 
analysis. 

Cs  :  Cardinality  of  the  source  record  set  in  one  backend. 

Ct  :  Cardinality  of  the  target  record  set  in  one  backend. 

Cb  :  Average  number  of  records  in  a  bucket. 


n  :  Number  of  Backends. 

B  :  Number  of  Index  Entries  in  the  hashing  table. 

Ti  :  Average  time  tc  read  (write)  a  block  of  records  from 
(to)  secondary  storage. 

Tb  :  Average  time  tc  read  (write)  a  record  form  (to)  a 
bucket. 

Tc  ;  Average  time  to  compare  the  common  attribute  values 
of  two  records. 

Th  ;  Average  time  tc  hash  a  record. 

Tm  :  Average  time  tc  merge  two  records. 


An  Analysis 
Implementation 


Straightforward 


Ne  recall  that  there  are  five  phases  in  this 
implementation  as  discussed  in  a  previous  section. 

Phase  1:  Since  there  are  Cs  local  source 

records  in  each  backend,  the  time  complexity  for  storing 
them  into  the  secondary  storage  is: 

Ti* (Cs/Cb) . 

Phase  2:  Since  there  are  Ct  local  target 

records  in  each  backend,  the  time  complexity  for  storing 
them  into  secondary  storage  is: 

Ti*  (Ct/Cb) . 

Phase  3:  The  time  complexity  for  this  phase  is 

ignored. 

Phase  4;  Since  each  backend  receives  (N-1)  ♦Ct 
target  records  from  the  other  backends,  the  time  complexity 
for  storing  them  in  the  secondary  is: 

(M-1)  *  (Ct/Cb)  *Ti. 

Phase  5:  Records  are  merged  in  this  phase. 

There  are  Cs  source  records  and  f!*Ct  target  records  in  each 
backend.  Each  block  of  the  source  records  is  compared  and 
merged  with  all  of  the  target  records.  It  takes  Ti  to  bring 
one  block  of  source  records  into  the  primary  memory  from  the 
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secondary  storage  and  M*(Ct/Cb)*Ti  for  the  entire  target 
record  set. 

It  takes  Cb*Tb  to  access  one  block  of  source 
records  and  M*Ct*Tb  to  access  all  of  the  target  records. 
Ihe  time  complexity  for  comparing  one  block  of  the  source 
records  and  all  of  the  target  records  is 

Cb*H*Ct*Tc. 

TJe  further  assume  that  there  are  k  fraction  of  target 
records  participating  the  merging  operation.  The  time 
complexity  for  merging  one  block  of  source  records  and  all 
of  the  target  records  becomes: 

k*M*Ct*Tm. 

Ihe  total  time  complexity  for  processing  one  block  of  source 
records  of  this  implementation  is: 

£  Tl  +  M*  (Ct/Cb)  ]+  (Cb*K*Ct*Cb)  ♦Tb+  (Cb*«!*Ct*Tc)  ♦  (k*M*Ct*Tm)  . 

There  are  Cs/Cb  blocks  of  source  records  in  each 
backend;  therefore,  the  time  complexity  of  this  alternative 
is: 

{Cs/Cb)  *  {£  li+M*  (Ct/Cb)  ]♦  (Cb+?!*Ct*Cb)  ♦Tb 
♦  (Cb*I1*Ct*Tc)  ♦(k*M*Ct*Tm)  ) 
or 

(M*Cs*Ct)  *[  Ti*  (Ib+k*Tm)  /Cb+Tc  ]+Ti*  (Cs/Cb)  ♦  Cs*Tb 
Because  Cs  may  be  equal  to  Ct  and  M  is  a  small  constant,  the 
time  complexity  may  be  further  simplified  to  be 

0(Cs*Ct)  or 

0  (Cs2)  . 

b.  An  Analysis  for  the  Sort-Matching  Implementation 

We  will  analyze  each  phase  of  this 
implementation  approach. 

Phase  1:  Each  backend  sorts  its  tiio  record  sets 
and  broadcasts  the  sorted  target  record  set  to  the  other 
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tackends.  Dae  to  the  large  size  of  records,  the  sorting 
operation  can  not  be  done  by  using  an  internal  sorting 
algoritbn.  There  are  several  external  sorting  algorithms 
which  can  sort  the  Iccal  source  records  and  the  local  target 
records  with  the  time  complexities  of  0 (Cs*  (logCs) )  and 
0(Ct*(log  Ct)),  respectively.  However,  these  algorithms  all 
have  some  limitaticns:  either  using  special  hardware 
configuration  or  running  different  software  among  processors 
[Befs.  9,10]. 

Because  we  do  not  want  to  put  limitaticns  on  the 
hardware  configuration  of  MBDS  and  to  use  different  software 
among  the  backends,  this  alternative  is  eliminated  from  our 
consideration. 


c.  An  Analysis 
Implementation 


for  the  Bucket-Hashing 


In  order  to  further  simplify  our  analysis,  we 
assume  that  the  local  source  records  and  target  records  can 
be  evenly  hashed  across  all  the  buckets  of  the  hashing 
tables  and  each  bucket  will  contain  only  one  block  of  local 
source  records  or  one  block  of  local  target  records.  First, 
we  analyze  the  alternative  that  uses  only  one  hashing  table. 

Phase  1:  Each  source  record  needs  to  be  hashed, 
written  into  a  bucket  by  its  hashed  value.  This  includes 
getting  the  block  of  that  bucket  from  the  secondary  storage 
and  writing  the  record  into  the  block  and  returning  the 
block  to  the  secondary  storage.  Therefore,  the  time 
complexity  for  each  backend  to  hash  and  store  the  source 
records  is: 

Cs*(7h  ^Tb  +  2Ti). 

Phase  2:  Every  time  a  target  record  is  hashed, 
the  bucket  with  that  hashed  value  is  checked.  If  the  bucket 
is  not  empty,  then  all  the  source  records  in  that  bucket 


Hill  le  retrieved  into  the  primary  memory,  compared  Hitb  the 
target  record  and  merged  with  it  if  their  common  attribute 
values  are  equal.  7he  tine  complexity  for  bring  one  bucket 
(block)  of  source  records  into  primary  memory  is  Ti.  Ihe 
time  complexity  for  accessing  those  source  records  from  the 
block  and  comparing  with  that  target  record  is: 

Cb  ♦  (Tb  ♦  Tc) . 

Suppose  that  the  probability  of  hashing  a  target  record  into 
a  non-empty  bucket  is  p  and  the  probability  of  satisfying 
the  merging  condition  is  f,  then  the  time  complexity  for 
each  backend  to  process  one  local  target  records  is: 

Th  ♦  p  ♦  £Ti  +  Cb  ♦  (Tb  ♦  f  ♦  Tc)  ]. 

Because  we  assume  the  source  records  are  evenly  hashed 
across  the  buckets  of  the  hashing  table,  p  is  equal  to  1. 
There  are  Ct  local  target  records  in  each  backend  so  that 
the  time  complexity  for  each  backend  to  process  its  local 
target  records  is: 

Ct*  {Th+[Ti+Cb*{Tb+Tc+f*Tm)  ])  . 

Phase  3:  Each  backend  receives  (M-1)*Ct  target 

records  from  other  backends.  The  time  complexity  for 
storing  these  records  back  to  the  secondary  storage  is: 

(fl-1)*(Ct/Cb)*Ti. 

Phase  4:  It  takes  (M- 1)  *  (Ct/Cb)  for  each  backend 
to  retrieve  all  the  non-local  target  records  from  the 
secondary  storage  into  the  primary  memory.  The  time 
complexity  for  processing  these  records  is: 

(H-1)  ♦Ct*{Th+[Ti+Cb*(Ib+Tc+k*Tm)  ]}  . 

The  time  complexity  of  this  phase  is: 

(M-1)  ♦(Ct/Cb)  *Ti+H*Ct  {Th+[Ti  +  Cb  (Tb  +  Tc+f*Tm)  ])  . 
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The  total  time  complexity  of  this  alternative 
for  a  backend  is: 

Cs  (Th+Tt  +  2Ti)+2(M-1)*  (Ct/Cb)  ♦Ti 
+  M*Ct  {Th+[  Ti+Cb  (Tb+Tc+f  ♦Tn)  ])  . 

How,  we  analyze  the  other  alternative  which  uses 
two  separate  hashing  tables. 

Phase  1:  The  source  records  and  the  target 
records  will  be  hashed,  grouped  into  the  buckets  of  separate 
hashing  tables  and  then  placed  onto  the  secondary  storage. 
The  time  complexity  for  each  backend  to  process  its  local 
records  is: 

(Cs+Ct)  ♦  {Th*Tb*2Ti)  . 

Opon  receiving  the  target  records  from  the  other 
backends,  each  backend  will  insert  those  incoming  records 
into  the  hashing  table  of  the  target  records  and  stored  them 
back  to  the  secondary  storage.  Since  those  non-local  target 
records  are  grouped  and  sent  by  their  backet  numbers,  the 
insertion  time  is  so  guick  that  it  may  be  ignored.  By  using 
an  inverted  list,  the  time  complexity  for  each  backend  to 
return  those  incoming  target  records  to  the  secondary 
storage  is: 

(H-1)*  (Ct/Cb)  ♦Ti. 

Phase  2:  Records  of  these  two  hashing  tables 

will  be  processed  one  bucket  at  a  time.  For  any  bucket 
number  (i.e.,  a  table  entry),  if  the  buckets  of  both  hashing 
tables  are  not  empty,  then  all  blocks  of  the  records  of  both 
buckets  will  be  read  into  the  primary  memory  for  the  merging 
operation.  It  takes  Ti  for  bringing  one  bucket  of  source 
records  (in  this  case,  one  block)  into  the  primary  memory 
and  M*Ti  for  one  bucket  of  target  recc  is  (M  blocks).  The 
time  complexity  for  accessing,  comparing  and  possibly 


aerging  one  bucket  cf  source  records  with  one  bucket  (M 
blocks)  of  target  records  {not  including  the  disk  I/O  time) 
will  be: 

Cb*[Ib+a*Cb*('rb+Tc+f*Tm)  ]. 

The  expected  time  complexity  for  all  buckets  will  be: 

(Cs/Cb)  ♦Cb*[Tb*M*Cb*{Tb+Tc+f*Tm)  ] 

Therefore,  the  total  time  complexity  for  this  alternative 
is: 


(Cs+Ct)  {Th+Tb*2Ti)  ♦  (M-1)  *  (Ct/Cb)  ♦Ti 
♦  (Cs/Cb)  ♦Cb*[Tb^M*Cb*(Tb+Tc+f*Tm)  ] 


1  One  Common  Table 

Two  Separate  Table 

Th  Cs+M*Ct 

1 

(Cs+Ct) 

1 

Tb  Cs+Ct*»*Cb 

1 

(H+2)  ♦Cs+Ct 

I  Tc  1  M*Ct*Cb 

Cs*H*Cb 

j  Ti  j  2Cs+M*Ct+2(H- 1)*(Ct/Cb) 

(Cs+Ct)  +  (H-l)  ♦Ct/Cb 

1  Tm  j  M*Ct*Cb*f 

Cs*M*Cb+f 

Figure  3.3  The  Time  Complexities  of  the 
Backet '■Bashing  Implementations. 


A  summary  of  the  time  complexity  in  terms  of  Th, 
Ti,  Tb,  and  Tc  for  these  two  subalternatives  is  shewn  in 
Figure  3.3.  As  shown  in  Figure  3.3,  alternative  which  uses 
two  separate  tables  is  better  than  the  other  one  which 
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employs  only  one  table.  Since  Cb  and  H  are  constants,  £  is 
smaller  than  1  and  Ct  may  be  equal  to  Cs,  we  can  farther 
simplify  the  the  time  complexity  of  the  two-separate-tables 
subalternatiwe  to  be: 

0{Cs+Ct)  or 
0  (Cs)  . 


d.  The  Conclusion  for  Our  Implementation  Approach 

A  summary  of  the  analysis  for  those 
implementation  approaches  in  terms  of  time  complexity  are 
shown  in  Figure  3.4.  Clearly,  the  one  based  on 
Bucket- Hashing  with  two  separate  hashing  tables  is  the  best 
approach.  Therefore,  our  implementation  will  be  based  on 
that  approach.  The  details  of  design  and  implementation 
will  be  discussed  in  the  next  chapter. 


Straightforward 


0(Cs2) 


I 


I 


Sorting-Ma  tching 


Not  considered 


IT.  DETAILED  DESIGH  POE  lEPLEHEHTIHG  HETRIETE-CCHHOH 

OfEBATIOH  IHTO  EBDS 


In  the  previous  chapter,  a  bucket- hashing  tased 
ispleaentation  approach  has  been  selected  for  implementing 
the  retrieve-common  operation  into  I13DS.  In  this  chapter,  ve 
focus  on  specifying  the  details  of  that  approach  and  discuss 
any  of  the  existing  MBDS  software  which  will  be  affected  by 
this  ixplementation .  Our  primary  goal  is  to  use  the 
existing  software  as  much  as  possible  and  to  minimize  the 
effects  which  may  be  caused  by  the  implementation. 

The  operations  of  the  retrieve-common  reguest  may  be 
described  in  four  phases.  First,  the  user's  request  must  be 
preprocessed  so  that  all  backends  can  be  informed  by  an 
appropriate  message.  This  is  the  request-preprocessing 
phase .  Second,  the  records  of  both  the  source  and  the 
target  record  sets  are  retrieved  before  the  merging 
operation.  This  is  the  record-retrieving  phase.  Third, 
those  retrieved  records  are  hashed  on  the  values  of  their 
join  attributes  and  stored  into  a  bashing  table  according  to 
their  hashed  values  (i.e.,  the  bucket  numbers).  We  recall 
that  there  are  two  hashing  tables,  one  for  the  source 
records  and  one  for  target  records.  Further,  the  hashed 
local  target  records  are  broadcasted  to  the  other  backends. 
This  is  the  ha shinq-and-storin j  phase.  lastly,  hashed 
records  of  source  buckets  and  hashed  records  of  target 
tuckets  are  compared  and  merged  bucket-by-bucket, 
respectively.  The  merged  results  are  sent  to  the  controller 
from  all  of  the  backends.  This  is  the  merging  phase.  The 
controller  then  forwards  those  results  to  the  host  computer. 

The  operations  of  the  first  and  second  phases  can  be 
done  by  the  existing  system  software  with  minor 
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modifications.  However^  in  order  to  accomplish  the 
operations  of  the  last  two  phases,  we  must  design  a  new  set 
of  procedures,  which  we  have  referred  to  as  the  hashing 
module.  In  the  remainder  of  this  chapter,  we  first  describe 
the  hashing  module,  and  then  the  operations  of  those  four 
phases. 

A.  TBE  EASHIIG  HODOIE 

This  module  is  designed  to  accomplish  the  operations  of 
the  last  two  phases  cf  the  retrieve-common  request.  There 
are  three  procedures  within  this  module.  They  are:  the 
hashing  procedure,  the  bucJcet -block  tracking  procedure  and 
the  merging  procedure.  In  this  section,  we  first  discuss 
the  two  different  alternatives  for  implementing  this 
module.  After  choosing  the  better  alternative,  we  then 
describe  the  three  procedures  of  the  hashing  module. 

1.  Alternatives  for  Implementing  the  Kashin g  Module 

There  are  two  alternatives  that  may  be  used  for 
implementing  the  hashing  module.  In  the  first  alternative, 
the  hashing  module  is  implemented  as  a  separate  process  of 
the  backend.  This  alternatives  modifies  the  existing 
process  structure  of  a  backend  by  introducing  a  sixth 
process  and  its  associated  communication  paths  into  each 
backend.  In  the  second  alternative,  the  hashing  module  is 
implemented  as  part  of  the  existing  record  processing 
process  (BECP) .  This  alternative  leaves  the  existing  backend 
process  structure  unchanged. 

a.  As  a  Separate  Process 

In  this  alternative,  the  hashing  module  is 
designed  as  a  separate  process  of  the  backend.  The  inputs 
to  the  hashing  module  are  either  the  local  source  or  target 


records  frou  the  local  RECP  or  the  othec  tar*jet  records  froa 
the  BECFs  of  the  other  backei;is.  The  outputs  from  the 
hashing  aodule  ace  the  merged  results,  which  are  sent  to  the 
contrcller.  The  transfer  of  records  Letween  processes 
(i.e.,  non-local  tarvjet  records  from  "Put  Pci"  to  the 
hashing  module  or  the  local  source  records  or  the  local 
target  records  from  the  local  RECP  to  the  hashing  module)  is 
accomplished  using  the  intorpiccess  message  capabilities  of 
each  lackend.  The  r.ew  process  structure  of  each  Lackend 
with  the  additional  ccmmunicat  ion  paths  is  shown  as  Fig  4.1. 
Since  the  hashing  ncdule  is  an  independent  process,  the 
effects  of  this  implementaticn  on  the  other  processes  of 
HBOS  may  be  minimized. 


Figure  4.  1  Hashing  Hodole  is  a  Separate  Process. 


t.  As  a  Procedure  within  Record  Processing 


In  this  alternative,  the  hashing  module  is 
designed  as  a  group  cf  procedures  that  are  added  to  FECP. 
In  Figure  4.2  we  show  the  structure  of  the  hashing  module 
with  EECE.  The  local  records  (both  the  source  records  and 
the  target  records)  are  retrieved  by  the  physical  data 
operation  of  RECP  of  each  lackend.  Once  the  records  are 
retrieved,  they  are  sent  to  the  hashing  module.  The 
non-lccal  target  records  are  received  by  RECP  from  the  ether 
backends  and  then  passed  to  the  hashing  module.  The  merged 
results  are  then  sent  to  the  controller.  With  modularized 
programming,  the  hashing  module  may  be  independently 
implemented  with  a  minimal  effect  on  the  original  RECP 
software. 


EECP  of  Each  Backend 


Physica 1 
Data 

Operation 

i  Retrieved 

Local  Records 

Hashing  Nodule  I 


Aggregate 

Operation 


Figure  4.2  Basing  Hodale  as  Part  of  RECP 


c.  Comparison  of  These  Two  Alternatives 

Both  alternatives  can  be  easily  implemented  with 
minimal  effect  on  the  existing  system.  The  difference 
between  these  two  alternatives  is  the  way  that  the  local 
records  are  passed  ficm  the  **physical  data  operation"  to  the 
hashing  module.  In  alternative  (a)  ,  the  records  are  passed 
as  an  interprocess  message.  In  alternative  (b)  ,  the  records 
are  passed  as  a  parameter  of  a  procedure  call.  We  choose 
alternative  (b)  for  three  reasons. 

(1)  The  message-passing  between  two  processes  within  a 
backend  is  slower  than  the  parameter-passing.  In 
message-passing,  both  processes  have  to  access  a 
common  memory  to  put  (or  get)  message.  The  accessing 
time  coupled  with  the  tine  required  to  place  a 
message  in  the  common  memory  by  the  sender  and  fetch 
the  message  frcm  the  common  memory  by  the  receiver  is 
considerable.  In  parameter- passing ,  only  the  logical 
address  of  the  record  buffer  is  passed  between  the 
procedures,  which  is  much  simpler  and  faster. 

(2)  Even  if  message-passing  within  a  computer  is  extremely 
fast,  there  is  a  large  number  of  messages  (i.e., 
records)  which  is  considerable.  Since  it  amounts  to 
route  the  messages  (records)  between  two  processes. 

(3)  The  extra  communication  paths  required  by  alternative 

(a)  (i.e,  ,  the  communication  paths  among  the  hashing 

module  and  the  other  HBDS  processes) ,  increase  the 
number  of  messages  passed  within  a  backend  and  among 
backends.  By  increasing  the  inter-backend  and 
intra-backend  communication,  we  may  adversely  effect 
the  overall  performance  of  a  backend. 
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2.  Ihe  Hashing  Frocedure 


Ihis  procedure  is  used  to  perform  the  hashing 
operation  on  the  values  of  the  join  attributes  of  the  input 
records.  The  inputs  to  the  procedure  are  either  the  local 
source  records  or  the  local  target  records,  which  are 
received  from  the  physical-data-operation  subprocess  of 
EECP.  The  output  from  the  procedure  are  the  input  records 
and  their  hashed  values  (i.e.,  the  bucket  numbers),  which 
are  sent  to  the  bucket-block  tracking  procedure  with  the 
request  id  for  further  processing. 

The  hashing  operation  is  done  by  the  hashing 
functions  of  this  procedure.  Since  the  type  of  the  values 
of  the  join  attributes  may  either  be  an  integer  or  a 
character  string,  we  have  designed  two  hashing  functions  in 
this  procedure.  Generally,  a  good  hashing  function  should 
satisfj  the  following  three  requirements: 

(1)  All  of  the  records  should  be  evenly  distributed  into 
buckets  of  the  hashing  table; 

(2)  The  chance  of  hashing  different  records  into  the  same 
bucket  should  be  minimized;  and 

(3)  The  hashing  computation  should  be  fast. 

These  requirements  are  closely  related  to  the  number  of 
buckets  and  the  hashing  algorithm  which  is  used  in  the 
hashing  function. 

a.  The  Number  of  the  Buckets 

A  hashing  table  with  a  large  number  of  buckets 
is  useful  for  a  number  of  reasons.  First,  the  large  number 
of  buckets  may  reduce  the  chance  of  hashing  different 
records  into  the  same  buckets.  Second,  the  number  of 
records  in  each  bucket  is  also  quite  small,  and  this  will 
reduce  the  access  time  during  merging.  However,  it  would  be 
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impractical  to  have  a  table  with  a  very  large  number  of 
bucket  entries,  where  each  bucket  would  only  contain  a  few 
records.  When  the  table  becomes  exceedingly  large,  a 
substantial  cost  is  incurred  to  maintain  the  bucket  index. 
The  bucket  index  of  a  hashing  table  is  an  array  of 
fixed-size  bucket  entries.  There  is  a  bucket  entry  for  each 
tucket  to  keep  track  cf  the  records  which  are  stored  in  that 
bucket.  Therefore,  the  number  of  buckets  (and  therefore  the 
tucket  entries)  can  be  computed  by  the  following  equation: 

let  X  be  the  size  of  the  bucket  index  (measured  in  bytes) , 

Y  be  the  size  of  a  bucket  entry  (measured  in  bytes) , 
then  the  number  cf  buckets  is  (X  /  I)  . 

For  example,  if  the  size  of  bucket  index  of  a  hashing  table 
is  8K  bytes  and  the  size  of  each  bucket  entry  is  8  bytes 
then  the  number  of  bucket  entries  for  that  hashing  table  is 
Ik,  i.e.,  1024. 

How  should  we  determine  the  size  of  the  bucket 
index  cf  our  hashing  table?  Since  USDS  allows  the 
concurrent  execution  cf  different  user  transactions,  there 
may  be  a  number  of  retrieve-ccmraon  requests  being  processed 
by  the  system.  Each  of  the  retrieve-common  requests 
requires  two  hashing  tables,  one  table  for  the  source  record 
set  and  one  table  for  the  target  record  set.  Because  of  the 
potentially  large  number  of  hashing  tables  concurrently  in 
use,  it  will  be  necessary  to  store  the  bucket  indexes  of  the 
tables  in  the  secondary  storage  and  stage  them  into  the 
primary  memory  on  demand.  To  minimize  and  optimize  the  size 
of  the  bucket  index  of  the  hashing  table,  it  is  desirable  to 
have  the  size  of  the  bucket  index  as  a  multiple  of  the  unit 
of  disk  I/O  transfer.  For  example,  if  the  unit  of  disk  I/O 
transfer  (which  is  typical  the  track  size)  is  4K  bytes,  then 
the  size  of  the  bucket  index  shall  be  M*4K  bytes,  where  M  = 
(1,  2,  3,  ...}.  In  cur  case,  we  choose  16K  bytes  to  be  the 


size  of  our  hashing  table,  yielding  2048  entries  (therefore, 
2048  tuckets)  in  the  hashing  table  each  with  a  bucket  entry 
size  of  8  bytes. 

b.  The  Hashing  Algorithm 

Since  the  value  type  of  the  join  attribute  may 
he  either  an  integer  cr  a  character  string,  we  have  designed 
two  hashing  functions,  one  for  each  value  type. 

(1)  The  Hashing  Algorithm  for  the 

Integer- Valued  At  tributes.  In  order  to  evenly  distribure 
the  values  of  all  jcin  attributes  into  the  buckets  and  to 
minimize  the  collisicns;  we  use  the  information  about  the 
maximum  and  minimum  values  of  the  join  attributes.  These 
information  is  maintained  in  the  record  templates.  The 
hashing  algorithm  for  the  integer  attribute  value  is 
described  as  follows. 

Step  1:  Get  the  MAX  (maximum)  and  MIH  (minimum)  values  of 
the  join  attribute  from  the  record  template,  let 
X  =  The_number_of  buckets_in_hashing_table 

Step  2:  If  HAX-MIN  <  X 

then  go  to  step  4 

else  Tempi  =  (MAX  -  MIN)  Div  X 

Step  3:  Get  the  input  record  and  let 

Y  =  The_value_of_the_join_at tribute 
bucket_numter  =  (Y  -  MIN)  Div  Tempi 
go  to  step  5 

Step  4:  Get  the  input  record  and  let 

Y  =  The_value_of_th€_join_at tribute 
bucket_numter  =  Y  -  MIN 

Step  5:  Heturn  the  tucket  number  to  the  calling  procedure. 

(2)  The  Hashing  Algorithm  tor  the 

Character- Valued  Attributes.  The  record  template  does  not 


The  record  template  does  not  provide  the  maximum  and  the 
minimum  values  for  the  character- valued  attributes  as  it 
does  for  integer- valued  attributes.  In  order  to  minimize 
collisions  and  distribute  records  evenly  into  buckets,  we 
design  a  lookup  table,  which  is  an  array  with  2048 
character-string  elements,  to  perform  the  hashing  function. 
The  number  of  the  elements  is  egual  to  the  number  of  the 
entries  in  the  bucket  index  of  the  hashing  table.  The 
values  of  the  join  attributes  of  the  input  records  are 
searched  against  the  contents  of  the  lookup  table  to  obtain 
the  bucket  values.  The  binary  search  algorithm  is  used  to 
minimize  the  searching  time  of  the  lookup  table. 

The  contents  of  the  entries  of  the  lookup 
table  are  created  in  the  following  way: 

(1)  Get  a  English  dictionary  with  more  than  2048  pages; 

(2)  Divide  the  pace  number  by  the  number  of  the  buckets 
(in  our  case  the  number  is  2048) ; 

(3)  Let  the  result  be  x.  y,  where  the  x  and  y  are  positive 
decimal  digits; 

(4)  Pick  up  the  last  word  of  every  x.y  page  from  the 
dictionary  and  place  the  first  four  characters  as  an 
entry  in  the  lookup  table;  and 

(5)  If  the  length  of  the  selected  word  is  less  than  4, 
fill  the  word  with  trailing  blanks. 

We  use  only  the  first  four  characters  to  compare  the  values 
of  join  attributes  for  two  reasons.  First,  we  believe  that 
there  are  very  few  English  words  that  will  have  the  same 
first  four  letters.  Second,  we  want  to  reduce  the 
primary-memory  reguireraents  for  the  lookup  table. 

The  algorithm  for  the  character-valued 
attributes  is  as  follows. 

Step  1:  Let  MIN  =  0  and  MAX  =  2047. 

Step  2:  Get  the  input  record  and  let 

X  =  The_ value_of_the_join_attribute; 

Step  3:  If  X  >  look_up_table[ MAX  ] 


bucket_number  =  MAX,  go  to  step  6. 

Step  4;  Use  binary  search  to  find  the  bucket  number. 

Step  5:  Return  the  bucket  number  to  the  calling  procedure. 

3 .  Ihe  Bu cket- Block  Tracking  Procedure 

The  input  to  this  procedure  may  be  either  the  local 
records  (either  the  source  records  or  the  target  records) 
with  their  bucket  numbers  from  the  hashing  procedure  or  the 
non-local  target  records  grouped  by  their  bucket  values  from 
the  other  backends.  The  outputs  from  the  procedure  are  the 
logical  addresses  of  the  hashing  tables  of  the  source 
request  and  the  target  request,  which  are  sent  to  the 
merging  procedure  for  the  merging  operation.  The 

bucket-block  tracking  procedure  performs  three  functions: 

(1)  maintaining  a  global  table  to  keep  track  of  the 
logical  addresses  of  the  hashing  tables  for  all 
retrieve-common  requests  which  are  currently  being 
processed  in  the  system; 

(2)  maintaining  a  hashing  table  for  the  current  request 
and  keeps  track  of  all  of  the  buckets  and  blocks  of 
that  hashing  table;  and 

(3)  storing  the  input  records  into  appropriate  buckets  and 
blocks  according  to  their  bucket  values. 

In  order  to  provide  a  better  understanding  of  this 
procedure,  we  first  introduce  the  structures  of  the  blocks, 
the  buckets,  the  hashing  table  and  the  global  table.  He  then 
discuss  how  these  functions  are  accomplished. 

a.  The  Structure  of  a  Block 

Each  block  is  divided  into  two  parts:  the  header 
and  the  tody.  The  header  has  two  fields.  The  first  field  is 
used  to  record  the  length  (in  bytes)  of  the  body,  i.e. ,  all 
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of  the  records  in  bytes  currently  stored  in  this  block.  The 
second  field  is  used  to  store  the  logical  address  of  the 
next  block  whose  records  have  the  same  bucket  value  as  this 
block.  If  there  is  no  other  block  of  the  bucket,  then  there 
is  a  null  address  in  this  field.  The  body  is  used  to  store 
the  hashed  records  and  their  common  attribute  values. 
Blocks  which  are  in  the  same  bucket  are  maintained  as  an 
inverted  list  and  tracked  by  their  logical  addresses.  The 
structures  of  the  block  and  its  header  are  shown  in  Figure 
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B.  The  Structure  of  Block  Header 
Figure  4.3  The  structures  of  Block  and  Its  Header. 


t.  The  Structure  of  a  Bucket 

As  mentioned  in  chapter  II,  instead  of  using 
primary  and  overflow  areas,  each  bucket  uses  fixed-size 
blocks  to  store  records.  The  number  of  blocks  per  bucket 
may  vary  among  different  buckets.  The  bucket  entry  is  used 
to  indicate  the  status  and  to  keep  track  of  the  blocks  of 
that  bucket. 


Each  bucket  entry  in  the  backet  index  has  two 
farts:  the  status  emd  the  logical  address  of  the  block 
currently  being  used.  The  status  is  used  to  indicate 
whether  or  not  the  bucket  is  empty.  The  size  of  the  bucket 
entry  is  8  bytes,  where  2  bytes  are  used  for  the  status  etnd 
6  bytes  are  used  for  the  logical  address  which  is 
represented  by  a  tuple  consisting  of  the  logical  disk 
number,  the  logical  cylinder  number  and  the  logical  track 
number.  Ihe  structure  of  a  backet  is  shown  in  Figure  4.U. 
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Figure  4.4  The  Structure  of  a  Bucket-entry. 


c.  The  Structure  of  the  Hashing  Table 

A  hashing  table  is  an  array  of  backet  entries, 
lie  anticipate  that  the  retrieve-common  operation  will  be 
implemented  on  a  SON  iorkstaticn  running  the  UNIX  operating 
system,  with  a  16K  unit  of  disk  I/C.  Using  the  equation 
from  the  previous  subsection,  we  can  compute  the  number  of 
bucket  entries  for  our  hashing  table  to  be  2048. 

d.  The  Global  Table 

Since  HBDS  allows  concurrent  processing  during 
the  retrieval  operation,  there  may  be  several 
retrieve-common  requests  in  the  system.  We  need  a  table 
that  keeps  track  of  all  of  the  logical  addresses  of  the 


hashing  tables  for  each  retrieve-common  request.  Each  entry 
of  the  global  table  contains  two  parts:  the  request  id  of 
the  request  and  the  logical  address  of  the  hashing  table  for 
that  request.  The  request  id  consists  of  the  traffic  id, 
which  is  the  unique  identifier  of  a  traffic  unit  [ Be£.  11  : 
p.  41],  and  the  request  nuaber  which  indicates  the  sequence 
of  the  request  in  the  traffic  unit.  Each  entry  of  the 
global  table  is  created  whenever  a  new  hashing  table  is 
created,  and  deleted  when  that  request  has  been  completed 
processing.  The  structure  of  the  global  table  is  shown  in 
Figure  4.5. 
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Figure  4.5  The  Structure  of  the  Global  Table. 


e.  The  Sequence  of  the  Operations  of  the 
Bucket-block  Tracking  Procedure 

The  steps  of  the  sequence  to  accomplish  the 
operations  of  this  procedure  are  described  as  follows. 

Step  1:  Create  and  initialize  the  global  table. 


57 


Stef 


Step 


Step 


Step 


Step 


Step 


Step 


Step 


Step 


Step 


Check  the  reguest  ID  of  the  input  records  with  the 
global  table  to  see  if  the  input  records  belong  to 
a  new  reguest.  If  they  do,  then  allocate  a  hashing 
table  for  that  request,  initialize  the  bucket 
index  and  store  the  logical  address  of  the  hashing 
table  into  the  global  table.  Otherwise,  get  the 
existing  hashing  table  into  the  primary  memocy 
using  the  logical  address  information  provided  by 
the  global  table. 

Extract  a  record  from  the  input  buffer.  If  the 
record  is  the  first  record  of  that  request,  then 
go  to  step  10. 

If  the  bucket  value  of  this  record  is  the  sane  as 
the  previous  one,  then  go  to  step  8. 

Store  the  block  which  contains  the  previous  record 
back  to  the  secondary  storage. 

Get  the  desired  bucket  entry  {table  entry)  for  the 
record  by  its  hashed  bucket-value.  Check  the 
status  of  the  bucket.  If  it  is  "empty”,  then  go 
to  step  11. 

Get  the  currently  used  block  by  its  logical 
address  in  the  bucket  entry. 

If  there  is  space  in  the  block  t  t  is  available 
for  storing  this  re  crd,  then  go  tc  step  12. 

Get  a  new  block,  the  current  logical  address 
of  the  bucket  entry  into  the  "logical  address  of 
next  block"  field  of  the  block  header.  Then, 
update  the  bucket  entry  with  the  logical  address 
of  this  new  block.  Goto  step  12. 

;Get  the  desired  bucket  entry  by  its  hashed 
bucket-value,  update  the  status  of  that  bucket 
entry  to  "net  empty". 

:Get  a  new  block  and  put  its  logical  address  into 
the  bucket  entry. 


Stef  12: Store  the  record  into  the  block  and  update  the 
"length  of  record"  field  of  the  block  header. 

Step  13:Itepeat  the  steps  3  to  12  until  all  records  have 
been  processed. 

Notice  that  the  block  is  not  inimediately 
returned  to  the  secondary  storage  after  the  insertion  of  one 
input  record.  Since  the  records  in  NBDS  are  stored  by 
clusters,  it  is  very  likely  that  records  within  the  same 
cluster  will  be  retrieved  again.  Therefore,  by  keeping  the 
current  block  in  the  primary  memory,  we  may  save  one  store 
and  one  read  operations  if  the  next  input  record  is 
retrieved  from  the  same  cluster  and  hashed  into  the  same 
bucket  (that  is,  they  may  have  the  same  bucket  value). 

4.  The  Nerginq  Procedure 

This  procedure  is  used  to  perform  the  merging 
operation.  The  inputs  to  this  procedure  are  the  logical 
addresses  of  the  hashing  tables  of  the  source  request  and 
the  target  request,  which  come  from  the  bucket-block 
tracking  procedure.  The  outputs  from  this  procedure  are  the 
merged  results,  which  are  sent  to  the  controller. 

The  algorithm  of  the  merging  procedure  is  as 

follows. 

Step  1:  Reserve  a  result  buffer. 

Step  2:  Get  the  hashing  tables  of  the  source  request  and 
the  target  request  by  their  logical  addresses. 

Step  3:  Compare  the  bucket  statuses  of  these  two  hashing 
tables  bucket  by  bucket.  If  both  buckets  contain 
records  for  a  particular  bucket  number,  then 
retrieve  all  the  records  associated  with  this 
particular  bucket  value  from  both  tables. 

Step  4:  Apply  the  straightforward  merging  algorithm  on 
those  retrieved  records.  Insert  merged  results 
into  the  result  buffer. 


Stef  5:  If  the  result  buffer  is  full,  then  send  its 
contents  to  the  controller. 

Step  6:  Repeat  steps  3,  4  and  5  until  all  the  buckets  have 
been  processed. 

Step  7:  Free  the  result  buffer. 


B.  TEE  OPEBATIOHS  OF  THE  FOOB  PHASES 

In  this  section  ve  discuss  the  operations  of  each  phase 
of  the  retrieve-coBBcn  request  and  the  software  which  will 
be  affected  by  those  operations. 

1.  The  Request- creprocessing  Phase 

a.  The  Operations 

The  operations  of  this  phase  include  parsing  the 
user*s  transaction  (or  request)  and  if  the  transaction 
(request)  is  correctly  parsed,  then  the  controller  will 
compose  an  appropriate  message  to  inform  the  backends  to 
begin  execution  for  the  request.  Since  the  retrieve-common 
request  is  conceptualized  and  executed  as  two  retrieval 
operations,  the  parser  has  to  parse  the  user's  request  and 
transform  the  request  from  the  form  of  a  single  request  to  a 
form  cf  a  transaction  with  two  requests. 

b.  The  Affected  Software 

Basically,  operations  of  this  phase  can  be  done 
by  the  existing  Bequest  Preparation  process.  However,  the 
software  for  this  process  must  be  modified  as  follows: 

(1)  The  parser  should  be  able  to  recognize  the  newly  added 
syntax  and  correctly  parse  the  request; 

(2)  The  composer  should  be  able  to  form  a  new  message  to 
inform  PP  and  all  of  the  backends  so  that  they  can 
perform  the  desired  operation; 
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(3)  New  Message  tjpes  are  added  for  processing  the 
retrieve-comaon  request;  and 

(4)  PP  and  all  of  the  backends  should  be  able  to  recognize 
and  process  the  new  created  message  for  the 
retrieve-common  request. 

2.  Ihe  Record-retrieving  Phase 
a.  The  Operations 

Operations  of  this  phase  include  the  address 
generation  and  the  record  retrieval  for  both  the  scarce 
request  and  the  target  request.  These  two  requests  will  be 
processed  by  DM  as  the  other  four  different  types  of 
requests.  As  mentioned  in  previous  chapter,  the  target 
records  are  processed  after  the  source  records.  In  crdcr  to 
separate  the  records  of  these  two  requests,  DM  will  first 
send  the  source  request  and  its  associated  address  set  to 
BECP,  and  hold  the  target  request  and  its  addresses  set 
until  receiving  a  message  from  RECP  indicating  that  ail 
source  records  have  been  retrieved. 

The  record-retrieving  operation  is  performed  by 
the  physical-da ta-operation  sutprocess  in  RECP  as  a  regular 
retrieve  request.  Instead  of  sending  the  retrieved  records 
to  the  controller,  control  logic  is  used  to  route  them  to 
the  hashing  module  for  hashing  and  subsequent  merging. 

k.  The  Affected  Software 

Host  of  the  operations  of  this  phase  are  done  by 
DM,  CC  and  the  Physical  Data  Operation  of  RECP  in  each 
backend.  The  affected  software  includes: 

(1)  We  need  to  add  control  logic  into  DM  so  that  the 
address  information  of  the  source  and  target  request 
will  not  be  sent  to  RECP  together;  and 
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(2)  We  need  to  add  a  new  procedure  to  handle  the 

retrieve-conmon  request  and  control  logic  to  route 

the  results  to  the  hashing  module  instead  to  PP. 

3.  The  Hashing- and- storing  Phase 

This  is  the  most  important  part  of  the 

retrieve-common  reguest.  All  of  the  records  are  prepared  in 

this  phase,  so  they  can  be  merged  on  next  phase.  The 
operations  of  the  hashing-store  phase  includes: 

(1)  performing  hashing  operations  on  the  local  records, 

(2)  table  maintenance  and  bucket-block  tracking 
operations,  and 

(3)  broadcasting  (and  receiving)  the  target  records  and 
their  bucket-values  to  (from)  the  other  backends. 

a.  The  Hashing  Operations 

This  operation  is  performed  by  the  bashing 
procedure  of  the  hashing  module.  Upon  receiving  the  local 
records  from  the  previous  phase,  the  hashing  procedure  will 
check  the  record  template  to  get  the  value  type  of  the 
common  attribute  values  and  then  apply  an  appropriate 
hashing  function  to  hash  the  common  attribute  values.  The 
records  and  their  hashed  bucket-values  will  then  be  passed 
to  the  bucket-block  tracking  procedure  for  further 
processing. 

b.  Table-maintenance  and  Bucket-block  Tracking 

Operation 

This  operation  is  done  by  the  bucket- tlock 
tracking  procedure.  A  global  table  is  maintained  to  store 
the  address  of  all  of  the  hashing  tables  for  all  of  the 
different  retrieve-common  requests  .ich  are  currently  being 


processed  by  the  system.  Whenever  a  nev  retrieve-ccmaioQ 
reguest  is  encountered,  the  bucket-block  tracking  procedure 
will  create  a  new  hashing  table  for  that  request.  The 
logical  address  of  the  newly  created  hashing  table  is  then 
stored  into  the  global  table.  The  hashing  table  will  be 
deleted  when  the  request  is  complete.  Records  are  stored 
into  buckets  according  to  their  hashed  values.  The 
information  of  the  backet  entries  and  the  block  headers  are 
maintained  and  updated  by  the  bucket-block  tracking 
procedure  as  described  in  the  previous  section. 

c.  Broadcasting  And  Receiving  Target  Records 

Between  Backends 

After  the  local  target  records  has  been  hashed 
and  processed,  each  backend  will  buffer  its  local  target 
records  (retrieved  ficm  the  target- hashing  table  with  their 
bucket  values)  and  broadcast  them  to  the  other  backends. 
Opon  receiving  those  non-local  target  records,  each  backend 
will  store  them  intc  the  target-hashing  table  by  their 
bucket  values.  A  checklist  is  used  to  ensure  that  the 
target  information  ficm  all  of  the  other  backends  has  been 
received. 

d.  The  Affected  Software 

Since  the  operations  of  this  phase  are  done  by 
the  hashing  module;  RBCP  is  affected  to  the  extent  that  this 
module  is  integrated  into  the  BECP  process.  No  ether 
existing  software  will  be  affected. 

4.  The  Merging  Phase 

This  is  the  last  phase  of  the  retrieve-cemaon 
operation.  The  local  source  records  and  the  entire  set  of 
target  records  are  ccipared  and  merged. 
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a.  The  aeration 


The  operations  are  performed  by  the  merging 
procedure  of  the  hashing  module.  Because  the  records  of 
both  tables  are  unscrted,  they  are  merged  by  using  the 
straightforward  algorithm.  The  merged  results  are  stored  in 
a  result  buffer  and  then  sent  to  the  controller. 

b.  The  Affected  Software 

Since  this  phase  is  also  done  by  the  hashing 
module;  RECP  is  affected  to  the  extent  that  this  module  is 
integrated  into  the  EICP  process.  No  other  existing  system 
software  is  affected. 
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V-  THE  IHPLBHEHTATIOB 


In  this  chapter,  we  describe  how  the  retrieve-comoon 
request  is  integrated  into  the  MBDS  system.  To  successfully 
perform  the  integration,  it  is  necessary  to  modify  a  portion 
of  the  MBDS  software.  Therefore,  this  chapter  also  on 
discussing  how  the  MBDS  software  is  modified  for  the 
integration  and  implementation  of  the  retrieve-ccmmon 
operation. 

In  the  remainder  of  this  chapter  we  first  describe  the 
modified  processes  of  the  controller.  Second,  we  describe 
the  modified  processes  of  each  backend.  Then,  we  present 
the  modified  MBDS  message-passing  facilities.  Finally,  we 
trace  the  execution  sequence  of  the  retrieve-common  request 
in  terms  of  the  types  of  messages  that  are  passed  among  the 
MBDS  processes. 

A.  TEE  MODIFIED  PBOCESSE5  OP  TBE  CONTBOLLEB 
1 .  The  Bequest  Preparation  Process  fBEQP) 

There  are  twc  subprocesses  in  REQ?,  namely  the 
parser  and  the  composer.  The  parser  parses  the  requests  and 
checks  for  syntax  errors.  The  composer  transforms  the 
correctly  parsed  requests  into  the  form  required  tor 
processing  at  the  backends. 

a.  The  Parser 

The  parser  does  both  the  lexical  and  the 
syntactical  analyses  cf  the  ABDL  transaction  (or  requests)  . 
Ihe  input  to  the  parser  is  either  a  request  or  a 
transaction.  The  outputs  frcm  the  parser  are  the  error 
messages  to  the  test  interface,  the  aggregation  operators  tc 
P?  and  the  correctly  parsed  requests  to  the  composer. 
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The  lexical  analysis  is  done  by  the  lexical 
analyzer  produced  by  lEX  [Ref.  11  ;  p.  42].  The  input  to 
LEX  is  a  specification  of  the  tokens  of  the  language (i.  e.  , 
the  tokens  of  ABDL)  in  the  form  of  regular  expressions  and  a 
set  of  subroutines  which  specify  the  actions  to  be  taken 
upon  recognition  of  the  tokens.  The  syntactical  analyzer  is 
generated  by  lACC  (Yet  Another  Compiler  Compiler)  [Bef.  12]. 
The  input  to  YACC  is  a  specification  which  includes  the 
declarations  of  tokens’  names,  the  rewriting  rules  of  the 
grammar,  and  the  action  program.  YACC  produces  a  C  program 
to  determine  whether  the  input  ABDL  transactions  (requests) 
are  syntactically  correct. 

For  the  parser  tc  correctly  parse  the  users' 
retrieve-common  requests,  we  have  made  several  modifications 
to  the  original  parser  subprocess.  These  modifications  are 


listed  below. 


(1)  Regular  expressions  for  the  LEX. 

He  have  added  a  new  set  of  regular  expressions  so 
that  the  lexical  analyzer  can  recognize  the 
retrieve-common  request  and  generate  appropriate 
tokens  which  in  turn  can  be  recognized  and  used  by 
YACC. 

(2)  Grammar  rules  fcr  YACC. 

A  new  set  of  rules  has  been  added  into  the  original 
ABDL  grammar  sc  that  the  parser  can  recognize  those 
tokens  which  are  generated  for  retrieve-common  request 
and  organize  those  tokens  by  these  newly  created 
rules. 

(3)  The  request  type. 

He  have  added  a  new  request  type,  the  retrieve-common 
request,  so  that  the  parsed  transaction  can  be 
correctly  identified  and  properly  executed  by  the 
composer  and  the  other  procet 'es  of  MBDS. 


(4)  The  action  program. 

The  input  of  the  retrieve-common  request  to  the  parser 
is  in  the  form  cf  a  single  request.  The  parser  should 
he  able  to  parse  this  request  and  generate  a 
transaction  of  two  retrieval  requests  (each  of  the 
retrieve-common  request  type)  .  If  the  join  attribute 
is  not  in  the  target  list  (of  the  source  or  the  target 
request)  ,  the  action  program  inserts  the  join 
attribute  into  the  head  of  the  target  list.  The  extra 
attribute- value  pairs  (i.e.,  the  join  attribute-value 
pairs)  of  the  retrieved  records,  which  are  going  to  be 
deleted  by  the  merging  procedure,  are  not  to  be  in  the 
results  so  that  the  merged  results  contains  only  the 
desired  attribute-value  pairs.  The  newly  added 
regular  expressions,  grammar  rules  and  the  SSI  for 
the  modified  action  program  are  provided  in  Appendix 
A, 

b.  The  Composer 

The  composer  receives  the  correctly  parsed 
requests  from  the  parser  and  formats  them  into  the  required 
message  format.  Then,  the  composer  broadcasts  the  formated 
messages  to  all  of  the  backends  for  execution.  We  have 
modified  the  original  composer  program  so  that  the  composer 
can  correctly  reformat  the  retrieve-common  request. 

2 .  The  Post  Processing  Process  (PP) 

The  post  processing  process  includes  the  aggregate 
post  operation  and  the  reply  monitor.  The  functions  of  PP 
are  described  in  [Hef.  11  :  p-  27].  The  aggregation  post 
operation  is  not  modified.  The  only  modification  in  the 
reply  monitor  is  to  recognize  the  new  request  type  for  the 
retrieve-common  request. 
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B.  TEE  HODIFICATIOI  OF  THE  BACKEHD  PROCESSES 


As  described  in  chapter  II,  one  of  the  design  issues  of 
HBOS  is  to  assign  as  such  work  as  possible  to  the  backends. 
Conseguently,  there  are  more  changes  in  the  processes  of 
each  backend  than  changes  in  the  controller.  The  affected 
processes  are  directory  management  and  record  processing. 

Directory  Management  Process  (DjJ) 

DM  receives  the  new  transaction  message  for  the 
retrieve-common  request  from  the  request  composer  and  then 
performs  a  number  of  directory  operations,  which  includes 
attribute  search,  descriptor  search,  cluster  search,  address 
generation  and  directory  table  maintenance.  From  our 
earlier  discussion,  we  know  that  the  source  and  target 
request  for  a  retrieve-common  request  should  not  be 
processed  concurrently  by  EECP.  The  target  request  must  be 
held  in  DM  until  RECP  informs  DM  that  the  source  request  has 
finished  execution.  Therefore,  DM  will  first  process  the 
source  request  and  send  the  request  and  its  addresses  to 
RECP.  The  target  request  is  held  in  DM  until  RECP  notifies 
CM  that  the  source  request  is  done. 

At  what  stages  of  the  DM  processing  do  we  hold  the 
target  request?  There  are  several  alternatives  for  holding 
the  target  request  in  DM.  These  alternatives  are  list  below. 

(1)  Hold  the  target  request  without  performing  any 
directory  operation. 

(2)  Hold  the  target  request  after  it  completes  attribute 
search. 

(3)  Hold  the  target  request  after  it  completes  attribute 
search  and  descriptor  search. 

(4)  Hold  the  target  request  after  it  completes  attribute 
search,  descriptor  search  and  cluster  search. 
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(5)  Hold  the  target  regaest  after  it  completes  attribute 
search,  descriptor  search,  cluster  search,  and  address 
generation. 

Alternatives  2,  3,  4,  and  5  will  generate  status  and 
directory  information  for  the  target  request  which  must  be 
held  somewhere.  Due  to  the  large  number  of  the  possible 
attributes,  the  size  of  the  status  and  directory  information 
may  be  too  big  to  be  kept  in  the  primary  memory,  i.e.,  they 
will  have  to  be  stored  back  to  the  secondary  storage.  The 
extra  disk  I/O  time  for  moving  the  status  and  directory 
information  in  and  out  of  the  primary  memory,  not  only  slows 
the  retrieve-common  operation,  but  also  increases  the 
program  complexity  and  causes  many  unnecessary  changes  to 
the  existing  software.  Therefore,  we  choose  alternative  (1) 
to  process  the  target  request. 

The  algorithm  for  the  modified  Dd  is  as  follows. 

Step  1:  Get  the  next  message  from  the  message  queue  and 
find  the  sender  of  the  message. 

Step  2:  If  the  sender  is  the  controller,  then  go  to  step 
5. 

Step  3:  If  the  sender  is  BECP,  then  go  to  step  8. 

Step  4;  If  the  sender  is  CC,  then  go  to  step  11. 

Step  5;  If  this  is  not  a  retrieve-common  transaction,  then 
go  to  step  11. 

Step  6:  Identify  and  separate  the  source  request  and  the 
target  request  from  the  transaction.  Hold  the 
target  request  and  perform  the  directory 
processing  on  the  source  request. 

Step  7:  Send  the  source  request  with  its  address  set  to 
BECP.  Go  to  step  1. 

Step  8:  If  this  is  not  the  message  which  indicates  the 
completion  of  retrieving  all  the  source  records, 
then  go  to  step  11. 


step  9:  Get  the  cozrespondeDt  target  request  and  perform 
directory  processing  on  that  target  request. 

Step  10:Send  the  target  request  with  its  address  set  to 
HECP. 

Step  11:Perform  the  original  DM  operation. 

The  SSL  for  the  modified  DM  is  provided  in  Appendix  B. 

2.  The  Becord  Processing  Process  (RSCP) 

BECP  receives  the  requests  and  their  address  sets 
from  DM  and  performs  the  physical  data  operations  on  those 
requests.  The  original  physical-data-operation  subprocess 
includes  a  control  function  and  a  subfunction  fo.  each  type 
of  request.  The  subfunctions  are  invoked  by  the  control 
function  according  to  the  type  of  request  being  processed. 

In  order  to  process  the  retrieve-common  request,  we 
have  made  two  modifications  to  BECP: 


(1)  adding  a  new  subf unction,  the  retrieve-ccnmon 

subfunction,  into  the  physical-data-operation 
subprocess;  and 

(2)  adding  a  new  subprocess,  the  hashing  module,  into 
BECP. 


a.  The  Betrieve-Common  Subfunction 

The  purpose  of  the  retrieve-common  subfunction 
is  to  direct  the  flow  of  the  control  in  the 
physical-data-operat icn  subprocess  so  that  the 
retrieve-common  request  can  be  processed  correctly.  The 

difference  between  the  retrieve-common  subfunction  and  the 
retrieve  subfunction  can  be  summarize!  as  follows. 

(1)  The  retrieve  subfunction  sends  the  retrieved  records 
to  the  PP,  whereas  the  retrieve-common  subfunction 
sends  the  retrieved  records  to  the  hashing  module. 


(2)  In  addition  to  sending  a  message  to  CC  to  indicate  the 
conpletion  of  the  retrieval  of  physical  data  (as  the 
retrieve  subfunction  does) ,  the  retrieve-ccmmon 
subfunction  will  send  a  message  to  notify  on  that  all 
the  source  records  have  been  processed. 

The  algorithm  for  the  retrieve-common 
subfunction  is  as  follows. 

Step  1:  Reserve  a  result  buffer. 

Step  2:  For  each  address  in  the  set  of  tracks  which  are 
furnished  by  DM,  fetch  the  track  from  the  disk 
and  place  it  in  the  track  buffer  in  the  primary 
memory . 

Step  3:  Examine  the  records  in  the  buffer  one-by-one.  If 
the  record  is  marked  for  deletion,  disregard  it. 
If  the  record  does  not  satisfy  the  query, 
disregard  it.  If  a  record  satisfies  the  query, 
then  extract  the  values  for  the  attribute  names  in 
the  target-list  of  the  request  and  store  this 
information  in  the  result  buffer. 

Step  4:  Rhen  the  result  buffer  is  full,  send  the  contents 
of  the  buffer  to  the  hashing  module. 

Step  5:  Repeat  steps  2,  3  and  4  until  there  are  no  more 

addresses  for  the  request. 

Step  6:  Send  a  message  to  CC  to  release  the  lock  for  this 
request.  If  this  is  a  source  request,  then  send  a 
message  to  DM  so  that  ON  can  process  the  target 
request. 

Step  7:  Free  the  result  buffer. 

The  SSL  for  the  modified  control  function  and  the 
retrieve-common  subfunction  ace  provided  in  Appendix  C. 


b.  The  Hashing  Hodale 


The  hashing  nodule  performs  the  hashing  and 
merge  operations.  Ihe  merged  results  are  sent  to  the 
controller.  The  module  is  invoked  by  the  retrieve-common 
subfunction  of  the  physical-data-operation  subprocess. 
There  are  three  procedures  within  this  module,  the  hashing 
procedure,  the  bucket-block  tracking  procedure  and  the 
merging  procedure. 

(1)  The  Hashing  Procedure.  The  hashing 
procedure  receives  the  records  from  the  retrieve- con non 
subfuncticn  of  the  physical-data-operation  subprocess  and 
performs  the  hashing  function  on  the  value  of  the  join 
attribute  of  each  record.  The  records  and  their  hashed 
results  are  stored  in  a  result  buffer.  When  the  buffer  is 
full,  its  contents  are  passed  to  the  bucket-block  tracking 
procedure  for  further  processing. 

The  algorithm  for  the  hashing  procedure  is 

as  follows. 

Step  1:  Reserve  a  result  buffer. 

Step  2;  Get  the  data  type  of  the  value  of  the  join 
attribute  from  the  record  template  and  reserve  a 
result  buffer. 

Step  3:  Extract  a  record  from  the  input  buffer  which  is 
passed  from  the  retrieve-common  subfunction. 

Step  4:  Apply  the  appropriate  hashing  function  to  hash  the 
value  of  the  join  attribute  of  the  record 
according  to  data  type.  (See  Chapter  IV  again.) 

Step  5:  Store  the  record  and  the  hashed  bucket  value  in 
the  result  buffer. 

Step  6:  If  the  result  buffer  is  full,  then  send  the 
contents  of  the  result  buffer  to  the  tucket-block 
t  eking  procedure. 


step  7:  Repeat  steps  3,  4,  5  and  6  until  there  are  no  Qore 
records  in  the  input  buffer. 

Step  8:  Free  the  result  buffer. 

The  SSL  for  the  hashing  procedure  is  provided  in  Appendix  D. 

(2)  The  Eucket-block  Tracking  Procedure.  This 

« 

procedure  stores  the  records  (both  the  source  records  and 
the  target  records)  into  blocks  according  to  their  bucket 
values  and  maintains  one  hashing  table  for  the  currently 
processed  request  and  one  global  table  to  store  the 
logical-hash-table  addresses  for  all  of  the  retrieve-common 
requests  in  system.  The  inputs  to  this  procedure  are  the 
records  and  their  hashed  bucket  values,  vhich  either  come 
from  the  local  hashing  procedure  or  from  the  other  backends. 
A  checklist  is  used  to  ensure  that  the  hashed  results  of  the 
non-local  target  records  are  received  from  all  of  the  other 
backends.  There  is  also  an  additional  disk  I/O  buffer  used 
in  this  procedure  to  tove  the  blocks  of  each  bucket  into  and 
out  of  the  primary  memory.  The  outputs  from  this  procedure 
are  the  logical  addresses  of  the  two  hashing  tables  of  the 
source  request  and  the  target  request,  which  are  passed  to 
the  merging  procedure.  The  structures  of  the  global  table, 
hashing  table,  bucket,  and  block  have  been  described  in 
Chapter  IV.  After  processing  all  of  the  local  records,  this 
procedure  will  group  the  local  target  records  together  with 
their  bucket  numbers,  and  then  broadcast  them  to  all  of  the 
other  backends. 

The  algorithm  for  this  procedure  is  as 

follows. 

Step  1:  Create  the  global  table  and  reserve  a  disk  I/O 
buffer. 

Step  2:  Get  an  input  buffer  of  records.  If  the  input 
buffer  contains  source  records,  then  go  to  step  5. 


step  3:  If  the  input  buffer  contains  local  target  records, 
then  go  to  step  6. 

Step  4:  If  the  input  buffer  contains  the  target  records 
received  from  the  other  backends,  then  go  tc  step 

8. 

Step  5:  Get  the  hashing  table  for  the  source  request.  Go 
to  step  7. 

Step  6:  Get  the  hashing  table  for  the  target  request. 

Step  7:  Store  the  record  into  a  bucket  and  perforn  the 
bucket-block  tracking  operation  (as  described  in 
chapter  IV).  Go  to  step  9. 

Step  8:  Perform  the  bucket-block  tracking  operations  to 
insert  these  incoming  records  into  the  target 
hashing  table. 

Step  9:  Repeat  steps  2  to  8  until  all  records  have  been 
processed. 

Step  10:  If  the  input  buffer  contains  local  target 
records,  then  retrieve  the  local  target  records 
from  the  target  hashing  table  bucket-by-bucket 
and  broadcast  them  (with  the  bucket  number)  to 
the  other  backends. 

Step  11:  If  the  input  buffer  contains  non-local  target 
records,  then  get  the  logical  address  of  the 
hashing  table  of  the  source  request.  Pass  the 
logical  address  of  the  hashing  tables  of  the 
source  request  and  the  target  request  to  the 
merging  procedure  for  the  merging  operation. 

The  SSL  for  this  procedure  is  provided  in  Appendix  E. 

(3)  The  Merging  Procedure.  This  procedure  does 
three  functions: 

(1)  fetching  the  hashing  tables  of  the  source  request  and 
the  target  request  by  their  logical  addresses  which 
have  been  provided  by  the  bucket-block  tracking 
procedure: 


(2)  peiforaiag  the  merging  operation  on  the  records  of 
both  hashing  tables  (as  described  in  chapter  IV) ;  and 

(3)  sending  the  merged  results  to  the  controller. 

The  merged  results  contains  only  the 
attribute-value  pairs  whose  attribute  names  are  specified  in 
the  target-lists  (either  the  source  reguest  or  the  target 
request).  The  extra  attribute-value  pairs  (i.e.,  the  join 
attributes  and  their  vales,  which  have  been  added  into  the 
target  lists  by  the  parser)  are  deleted  by  this  procedure. 
The  SSL  for  the  merging  procedure  is  provided  in  Appendix  E. 

C.  TBE  BODIFIED  MESSAGE- PASSING  FACILITIES 

In  Chapter  II  we  have  introduced  the  general  format  and 
the  different  types  cf  NBDS  messages  (see  Figure  2.3  and 
Figure  2.4) .  In  order  to  accomplish  the  retrieve-common 
reguest  we  have  added  two  new  message  types  which  are  shewn 
in  Figure  5.1. 

L.  ElECUTIOH  OF  A  BETS  I  EVE-CON  HON  BEQOEST— VIENED  VIA 

HESSAGE-PASSING 

In  this  section  we  describe  the  sequence  of  actions  for 
executing  the  retrieve-common  request  as  it  moves  through 
MBDS.  The  sequence  of  actions  are  described  in  terms  of  the 
types  of  messages  passed  between  the  HBOS  processes;  EEQP, 
PP,  DH,  BECP  and  CC.  The  order  in  which  message  are  passed 
is  denoted  alphabetically  (’a*  is  first).  The  digit 
following  the  ordering  letter  will  be  the  message  type  as 
shown  in  Figures  2.4  and  5.1. 

The  sequence  of  actions  for  a  retrieve-common  request  is 
shown  in  Figure  5.2.  First  the  retrieve-common  reguest  comes 
to  REQP  from  the  host  (a1).  BEQP  sends  two  messages  to  PP: 
the  number  of  requests  in  the  transaction  (b3)  and  the 
aggregate  operator  of  the  request  (c4)  .  The  third  message 


Message  Type 
Source 
Destination 
Explanation 


Message  Type 
Source 
Destination 
Explanation 


(32)  Hashed  Tar^  t  Seconds 
Eeccrd  Processing 

Record  Processing  (other  backends) 

This  message  contains  the  bucket  numbers 
of  the  target  hashing  table  and  all  of 
the  target  records  associated  with 
their  buckets. 

(33)  Source  Retrieve  Finished 
Record  Processing 

Directory  Management  (same  backend) 

This  message  is  used  to  notify  Directory 
Management  that  all  of  the  source 
records  have  been  retrieved.  DM  can  then 
begin  processing  the  target  request. 


Figure  5.1  The  Rev  BBDS  Message-Types. 

sent  by  EEQP  is  the  parsed  traffic  unit  which  goes  to  DM  in 
the  backends  (d6).  DM  sends  the  type-C  attributes  needed  by 
the  retrieve-common  request  to  CC  (e20) .  Once  an  attribute 
is  locked  and  descriptor  search  can  be  performed,  CC  signals 
DM  (f26).  DM  then  process  the  source  request  (target  request 
is  now  held) .  DM  performs  descriptor  search  and  signals  CC 
to  release  the  lock  on  that  attribute  (g23) .  DM  sends  the 
descriptor  ids  for  the  request  to  the  other  backends  (hi  5). 
The  DM  processes  in  the  other  backends  send  their  descriptor 
ids  to  the  DM  process  residing  in  this  backend  (i15).  DM 
then  uses  its  own  descriptors  and  the  descriptors  received 
from  the  other  backends  to  form  descriptor-id  groups.  DM 
now  sends  the  descriptor-id  groups  for  the  source  request  to 
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CC  (j21).  Once  the  descriptor-id  groups  are  locked  and 
cluster  search  can  he  performed,  CC  signals  DH  (k27) .  DM 
then  performs  cluster  search  and  signals  CC  to  release  the 
locks  on  the  descriptor-id  groups  (m25)  .  Next,  DM  sends  the 
cluster  ids  for  the  retrieval  to  CC  (n22)  .  Once  the  cluster 
ids  are  locked,  and  the  request  can  proceed  with  address 
generation  and  the  rest  of  the  source-request  execution,  CC 
signals  DH  (o28) .  DH  then  performs  address  generation  and 
sends  the  source  request  and  the  address  set  to  RECP  {p16)  . 
Once  the  retrieval  request  has  executed  properly,  RSCP  sends 
a  message  to  DH  to  start  processing  the  target  request 
(r33)  .  DH  processes  the  target  request  in  the  same  way  of 
processing  the  source  request  (i.e.,  phases  e20  to  f16) . 
Ihe  retrieved  records  are  processed  by  the  hashing  module  in 
RECP.  Once  the  local  target  records  have  been  processed 
properly,  the  hashing  module  broadcasts  the  hashed  target 
records  (grouped  by  bucket  numbers)  to  the  other  backends 
via  RECP  {s34) .  The  hashing  modules  in  the  other  backends 
sends  their  hashed  target  records  to  the  hashing  module  of 
this  backend  (t34)  .  Once  the  comparing  and  merging 
operations  performed  by  the  hashing  module,  the  results  are 
sent  to  PP  (u2) .  PP  then  forwards  the  results  to  the  host 


TI.  COHCIDSIOH 


A.  BE7IE11  AND  SOHBABl 

The  Kulti-backend  database  system  (HBDS)  in  the 
laboratory  for  Database  System  fiesearch  at  the  Naval 
Postgraduate  School  is  designed  to  overcome  the 
performance-gain  and  capacity-growth  problems  of  cither  the 
traditional  database  system  or  the 
single-backend-software-database  system.  The  original  MBDS 
supported  four  primary  operations,  namely,  P.ETBIEVE,  DIIETE, 
OPDATE  and  INSERT.  This  thesis  presented  the  design  and 
implementation  of  the  fifth  primary  operation,  the 
BETRIIVE-COMNON  operation.  The  retrieve-common  operation  is 
used  to  merge  two  files  by  common  attributes.  Our  major 
goal  is  to  maximize  the  utilization  and  minimize  the 
affects  to  the  existing  system. 

Be  have  analyzed  several  possible  design  alternatives 
and  then  selected  the  best  one  for  our  design  and 
implementation  approach.  The  key  issues  for  the  selections 
are  the  cohesion  to  the  design  requirements,  the  design 
issues  of  HBDS  and  the  time  ccmplexities  of  implementation. 
Cur  design  and  implementation  is  based  on  the  bucket-hashing 
approach.  Each  backend  performs  partial  merge  with  its 
portion  of  source  records  and  the  entire  set  of  target 
records,  sending  its  results  to  the  controller.  The 
controller  forwards  the  final  results  to  the  user  at  the 
host  computer. 

Based  on  the  selected  design  and  implementation 
approaches,  the  operations  of  the  retrieve-common  request 
are  executed  in  four  phases,  the  request-preprocessing 
phase,  the  record- retrieving  phase,  the  hashing-and-storing 
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phase  and  the  merging  phase.  The  retrieve-common  requests 
is  first  parsed  to  be  a  transaction  of  two  retrieval 
requests  (each  of  the  retrieve-common  type  request)  by  the 
parser.  Then,  the  parsed  requests  are  reformated  into 
required  message  formats  and  broadcasted  to  all  the  backends 
by  the  composer  of  the  controller.  3ach  backend  receives 
the  formated  messages  of  the  transaction,  separates  the 
source  request  and  the  target  request  and  then  performs  the 
directory  operations  and  retrieves  the  records  according  to 
the  queries  specified  in  the  requests.  The  retrieved 

records  of  the  source  record  set  and  the  records  of  the 
target  record  set  are  separately  hashed  on  their  common 
attribute  values  and  then  stored  into  buckets  of  the  source 
hashing  table  and  the  target  hashing  table,  respectively. 
The  hashed  records  of  the  source  buckets  and  the  records  of 
the  teirget  tuckets  ate  compared  and  merged  bucket-by-bucket. 
The  merged  results  are  sent  to  the  controller  from  all  of 
the  backends.  The  ccntroller  then  forwards  the  results  to 
the  host  computer.  In  order  to  accomplish  the  operations  of 
the  retrieve-common  request,  we  have  designed  a  hashing 
module  into  the  record-processing  rocess  of  each  backend. 

For  integrating  cur  design  into  MBDS,  we  have  made 
several  modifications.  These  are: 

(1)  the  message-passing  facilities, 

(2)  the  parser  of  the  request-preparation  process  of  the 
ccntroller,  and 

(3)  the  directory-management  process  and  the 
record-processing  process  of  each  tackend. 

The  algorithms  for  the  modifications  and  the  program 
specifications  (SSL)  are  also  provided  in  Character  IV,  V 
and  Appendices. 
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B.  FOTOBE  MOHK 


The  next  step  in  the  design  and  implementaticn  cf  the 
retrieve-ccmoion  operation  is  the  modification  of  the  .^BDS 
software  according  to  the  SSL  given  in  the  appendices.  There 
eire  two  classes  of  modifications.  First,  existing  software 
is  updated  to  reflect  the  changes  necessary  for  the 
retrieve-common  operation.  In  the  system,  new  message  types 
must  he  defined,  the  request-prepata tion  and  post-processing 
processes  of  the  controller  are  changed,  and  the 
directory-management  process  is  changed  to  correctly 
sequence  and  execute  the  retrieve-common  request.  Second, 
new  software  is  written  to  handle  the  processing  of  the 
retrieve-common  request,  i.e.,  the  hashing  module.  In  the 
system,  the  software  for  the  hashing  module  is  coded  tested, 
and  integrated  into  the  record-processing  process  of  each 
hackerd. 
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APPBMDII  A 

THE  HODIFIED  BEQUEST  PBEPABATION  PB06RAH  SPECIFICATIONS 

In  this  appendix,  we  present  only  the  modified  portions 
of  the  Bequest  Preparation  process.  The  original  SSI  is  in 

[Hef.  11  :  p.87]. 

A.  TEE  lEZ  HODIFICATIONS 

yiHt*****i^*  ****ittitt:it*********tt******  *******  ^**********1^** 

* 

♦  We  have  added  the  regular  expression  for  the  token 

♦  COMMON  into  LEX.  The  rest  of  LEX  remains  unchanged. 

♦  The  original  specification  is  in  the  Isrc  file. 

.  (The  original  Iscr  specifications.) 

• 

EY  ( 

return  (TOKBI)  ; 

} 

CCMBON  { 

return  (TOKCOM)  ; 

) 

"<  =  "  { 

return  (LS)  ; 

) 


(The  original  Iscr  specifications.) 


******** 


B.  TBE  lACC  HODIFICIIIOHS 


In  this  sectioo,  we  present  only  the  SSL  for  the 
modified  portion  of  the  parser.  The  original  program  is  in 
the  ysource  file, 
procedure  yyparse  ()  ; 

♦  This  procedure  is  used  to  parse  the  output  of  LEX.  * 

♦  The  modificaticn  of  the  yyparse  procedure  converts  ♦ 

♦  the  retrieve-ccmmon  reguest  from  a  single  request  ♦ 


*  into  a  transaction  of  two  reguests.  * 

♦  ♦ 

♦  Data  structures  and  variables  used  in  this  * 

♦  procedure:  ♦ 

♦  1.  No  new  data  structures  are  introduced  by  this  ♦ 

♦  modificaticn.  * 

♦  2.  com_flag_1,  com_flag_2,  com_flag_3,  com_flag:  ♦ 

♦  Boolean  variables  which  are  used  indicate  the  ♦ 

♦  different  conditions  of  the  retrieve^coamon  * 

♦  request.  ♦ 

♦  3.  new_tbl_ptr:  ♦ 

♦  A  pointer  to  a  request  table.  ♦ 

♦  The  request  table  is  defined  in  the  commdata.def* 

♦  file  as  a  EEQtbl_definition  structure.  * 

♦  4.  com_atrb_1,  com_atrfc_2:  ♦ 


♦  Character  strings  to  hold  the  common  attribute.  ♦ 

/♦  The  following  is  the  modified  portion  of  yysource. ♦/ 

/♦  Add  a  new  token  in  the  specification.  ♦/ 

5token  [str]  TOKCCM  /♦  coaaon  ♦/ 

/*  Add  new  derivations  and  program  specifications.  */ 

transaction  :  beg_tran  lines 

/♦  No  changes  in  this  part  ♦/ 
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/♦  cf  the  transaction  rule.  ♦/ 

I  beg_single_r€q  line 
if  com_flag 
then 

/♦  This  is  a  retrieve-common 
request.  ♦/ 

Perform  the  operations  which  are 
specified  under  the  beg_tran 
lines; 

else 

/♦  Perform  original  operations.  V 
end  if; 

end_r6g  :  SOR 

/♦  Clear  the  com_flags.  ♦/ 
com_flag  =  false; 
com_flag_3  =  false; 

r€g_forms  ;  delete  query 

I 

I  .../♦  These  are  the 

original  derivations.  ♦/ 

I 

I  reg_forms  common  target_list  req_forms; 
common  :  TOKCCH 

perform  CHECK_EEQDBST_TIPE  (req_tbl,OX)  ; 

/♦  Check  if  the  first  request  is 
a  retrieve.  */ 

if  CK 
then 

com_flag  =  com_flag_1  =  true; 
else 

perform  EEROR_PROCEDOFE; 
end  if; 

attribute  ;  LETTEFFIRST 


if  com_flag_1 
then 

/♦  This  attribute  is  the  comnion 
attribute  of  the  source 
request.  Copy  the  attribute 
into  com_atrb_1.  ♦/ 
perform  strcpy  (com_atrb_l , 
attribute)  ; 

/*  Put  the  common  attribute  of 
the  source  request  into 
the  target  list  and 
convert  the  request  table  from 
the  form  of  single  request  to 
the  form  of  a  transaction.  */ 
perforin  CONVBBT  (tbl_ptr->reg_tfcl, 

com_a trb_1, 
traf_id,  req_cnt, 
new_tbl_ptr->reg_tbl) ; 
com_£lag_2  =  true; 
com_flag_1  =  false; 

/♦  com^flag  =  true  ♦/ 
else 

if  com_flag_2 
then 

/♦  This  attribute  is  the 
common  attribute  of  the 
target  request.  ♦/ 
com_atrb_2  =  strcpy  (attribute) 
com_flag_3  =  true; 
com_flag_2  =  false; 
else 

if  com_flag_3  =  true; 
then 

/♦  This  is  the  first 
attribute  of  the  target 
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list  of  the  target 
request.  ♦/ 

insert  com_atrb_2  into  the 
target  request  table; 
insert  the  attribute  into 
the  target  request  table; 
end  if; 

/♦  Perform  the  original 
operations.  ♦/ 

end  if; 

end ; 

retrieve  :  TOKBEIRIEVE 

if  ccni_flag__3 
then 

perform  EBEOB_PROCEDURE; 
else 

if  com^flag 
then 

/♦  Change  the  type  to  be 
RETBIEVE_COHMON.  ♦/ 

end  if; 
end  if; 

/♦  Perform  the  original  operations.  ♦/ 

delete  ;  TOKDEIETE 

if  com_flag 
then 

perform  EEEOE_PROCEDDRE  () ; 
else 

/♦  Perform  the  original  operations.  ♦/ 
end  if; 

insert  ;  TOKIHSERT 


if  ccB^flag 
then 


perform  EBBOB_PROCZDaRE  ()  ; 
else 

/♦  Perform  the  original  operations,  */ 
end  if; 

update  :  TOKOPIATE 

if  ccf_flag 
then 

perform  EREOR_PfiOCEDaSE  ()  ; 
else 

/♦  Perform  the  original  operations.  ♦/ 
end  if; 

/♦  Perform  the  original  operations.  ♦/ 
end  procedure  yy parse; 

procedure  CONVERT {input :  source_reg_table,  source_com_atr, 

traf_id,  reguest_number , 
ind€x_teq_ptr; 

output:  target_reg_table,  reguest_number, 
index_reg_ptr) ; 

y  *«««*  4>«****4>«  4c ****** 

♦  This  procedure  is  used  to  rearrange  the  contents  * 


♦  of  the  reguest  table  of  a  reguest  which  is  the  * 

♦  source  retrieve  of  a  RETEIEVE_CO(li10N  reguest.  ♦ 

♦  This  procedure  performs  the  following  tasks:  ♦ 

♦  1.  Rearrange  the  source  reguest  table.  ♦ 

♦  2.  nake  the  common  attribute  of  the  source  reguest* 

♦  the  first  attribute  of  the  target  list.  * 

♦  3.  Create  a  request  table  for  the  target  request  * 

♦  and  return  it  to  the  calling  procedure.  * 

4>  4< 

♦  Data  structures  and  variables  used  in  this  * 

♦  procedure  are:  * 

♦  1.  source_reg_table,  target_reg_table:  ♦ 


*  The  request  tables  of  the  source  request  and  * 


♦  the  target  request.  * 

♦  2.  aew_tabl€:  ♦ 

♦  An  array  of  Reqt bl_definition  structures.  ♦ 

♦  3.  traf_id;  ♦ 

♦  A  character  string  which  is  the  traffic  id  of  ♦ 

♦  a  transaction.  ♦ 

♦  4.  reguest_number:  ♦ 

♦  An  integer  which  is  used  to  indicate  the  * 

♦  number  of  requests  in  a  traffic  unit.  ♦ 

♦  5.  ind€x_req_ptr:  ♦ 

♦  A  pointer  to  a  parsed  traffic  unit,  which  is  * 

♦  an  array  of  Beqtbl_def inition  structures.  * 

♦  6.  source_ccB_atr:  ♦ 

♦  A  character  string  which  is  the  common  * 

♦  attribute  of  the  source  request.  ♦ 


/♦  Ose  a  new  request  table,  new_table  to  hold  the 
contents  of  tie  sourc€_teq_table,  ♦/ 
new_table20]  =  ECR; 
new_table[ 1  ]  =  str_to_num  (traf ^id) ; 
new_table[2]  =  request_number ; 

new_tatle£3]  =  rcuttype;  /♦  Defined  in  yyparse().*/ 
new_table[4]  =  RETRIETE^COHMON; 

/*  Copy  the  contents  of  the  source  request  table  into 
the  nev_table.  ♦/ 
i  =  5; 
repeat 

new_table[i]  =  source_req_table[  i  ] ; 
i  =  i+1; 

until  soutce_r€q_table[ i]  =  EOQ; 

/♦  Insert  the  common  attribute  into  the  new_table.V 
new_table2i]  =  scurce_com_atr; 
i  =  i+1; 

/♦  Ccpy  the  rest  of  the  source_req_table  into 


the  new_table.  ♦/ 
repeat 

new_table£i]  =  source^req_table[  i- 1  ]; 
i  =  i+1; 

until  source_reg_table[ i- 1 3  =  null; 

/♦  Eut  an  end-of-request  marker,  EOH, 
into  the  new_table.  ♦/ 
new_table£i3  =  lOE; 

/♦  Copy  the  new_table  into  the  source_req_table.  */ 

i  =  0; 

repeat 

source_reg_table[  i  ]  =  new_table[  i  ]; 
i  =  i+1; 

until  source_req_table£  i]  =  BOH; 

/*  Increase  the  request  number,  and  create  a  request 
table  for  the  target  request-  ♦/ 
r€gu€st_number  =  reguest_number+ 1 ; 
perform  ALLOCATE_EEQ_TABLE  (target_reg_table)  ; 

/♦  Put  the  target_req_table  into  the 
parsed  traffic  unit.  ♦/ 
index_reg_ptr->reg_tbl[  request_number- 1  ] 

=  target_reg_table; 

/*  Feturn  the  request  number,  target_reg_table  and 
index_req_ptr  to  the  calling  procedure.  ♦/ 
end  procedure  C0N7EE1; 
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procedure  CHECK^BEQOEST_TYPE  (input:  reg_tbl;  output:  ok); 

♦  This  procedure  is  used  to  check  the  syntax  of  a  ♦ 

♦  retrieve_ccmiaoD  request.  If  the  request  type  is  ♦ 

♦  not  retrieve,  set  OK  to  false-  Otherwise,  set  OK  * 

♦  to  true.  Return  OK  to  the  calling  procedure.  ♦ 

end  procedure  CHECK_EEQUEST_TYPE; 


procedure  EREOR_PEOCEEORE  {)  ; 

^ 4  # 4i 4 4i4># 4i  4e  ^  *41*41  *  *4> 4i V  « 41*  4e4c 

♦  This  procedure  is  used  whenever  there  is  a  syntax  ♦ 

♦  error  in  the  request.  ♦ 

♦  This  procedure  will  print  an  error  aessage  and  ♦ 

♦  terminate  the  parser  operations.  ♦ 

******************************************************* 

end  procedure  EHROE_PEOCSDURE; 


IPPEHDII  B 

THE  HOOIFIED  BIBECTGRT  HINAGBHENT  PROGRAM  SPECIFICATIONS 

The  original  SSL  for  the  Directory  Management  process  is 
in  £  Ref.  13  ;  p.  8  2-102].  In  this  appendix,  we  present  only 
those  procedures  which  are  affected  by  the  retrieve-common 
request. 


procedure  DM_ParesedIrafOnit  ()  ; 


*  This  procedure  is  used  when  Request  Preparation  ♦ 

*  (EEQP)  sends  a  traffic  unit  to  Directory  ♦ 

*  Management  (DM).  The  original  procedure  is  in  ♦ 

*  the  tu.  c  file.  ♦ 

*  We  add  an  if  statement  to  differentiate  between  * 

*  the  retrieve- common  request  type  and  the  other  ♦ 

*  request  types,  * 


*  No  new  variables  are  introduced  in  this  procedure.  * 

*^,***^^i**Ht*it*^iliHi**iti^iit******************************it**/ 

/*  Get  a  pointer  to  the  parsed  traffic  unit.  ♦/ 
ti^ptr  =  DM_H$ParsedTrafOnit 0 ; 

/♦  Get  a  pointer  to  the  record  template 
of  this  traffic  unit.  ♦/ 
tBpl_ptr  =  get_tnpl_ptr (ti_ptr->ti_dbid) ; 

/♦  Get  a  pointer  to  the  attribute  table.  ♦/ 

AT  =  AT_lookuptbl  (ti^ptr->ti_dbid)  ; 

/♦  Get  the  type-c  attributes  for  the  traffic  unit 
and  send  them  to  DS_CC.  ♦/ 
perform  DM_TypeC_Attrs_Traf Unit  ()  ; 

/♦  Process  the  requests  of  this  traffic  unit.  ♦/ 


ri_ptr  =  ti_ptr  ->  ti_first_reg_poiiiter ; 

/♦  Get  the  type  cf  the  first  request  of 
this  traffic  urit. ♦/ 
if  r€g_type  =  RETBIEVE^COHMON 
then 

/♦  TTe  will  cnly  process  the  source  request.  ♦/ 

/*  The  target  request  will  not  be  processed  ♦/ 

/♦  until  the  record -processing  process  has  */ 

/♦  retrieved  all  of  the  source  records.  ♦/ 

/♦  Perform  the  descriptor  search  processing.  ♦/ 
done  =  NINS_SH_DESC  (Srie,  ri_ptr,  tapl_ptr,  AT) ; 
if  done 
then 

/♦  Broadcast  the  descriptor  ids  to  the 
other  backends.  ♦/ 

DM_Broadcast_DIDs (Brid)  ; 
end  if; 
else 

/*  This  is  net  a  retrieve-common  transaction,  so 
process  the  requests  of  the  traffic  unit 
one-by-one.  */ 

end  if; 

end  procedure  DM_Par€sedTraf Onit; 


procedure  DM_RecP_Hsg  () 

*  This  procedure  is  used  when  there  is  a  message 

*  for  DN  from  S£CP  (in  the  same  backend). 

*  Ue  add  a  new  message  type  to  indicate  that  all 

*  cf  the  source  records  have  been  retrieved. 


*  Mo  new  data  structures  or  variables  are  used 

*  The  original  procedure  is  called  by 


♦  DM_THIS_BE_WSG 0  and  is  in  the  diraan.c  file.  ♦ 

/♦  Get  the  aessage  type.  ♦/ 

MsgType  =  DH_R$Type; 
switch  (MsgType) 
case  OldNewValue: 

perfora  DH_01dNewValues  ()  ; 
case  UpdFinished: 

perform  D!!_UpdFinished  ()  ; 
case  Source_f icished: 

/♦  This  is  the  message  which  indicates  the 
completion  of  the  retrieval  of  all  the 
source  records.  ♦/ 
perform  DI!_Source_f inished  (msg)  ; 
end  switch; 

end  procedure  DIl_EecE_,Msg ; 


procedure  DM__Source_finished (input:  message); 

yi^*t**********************iit***************************** 

*  This  procedure  is  used  when  DN  receives  a  messages,  * 

*  from  RECP,  which  indicates  the  completion  of  the  ♦ 

*  retrieval  of  all  of  the  source  records.  DM  is  now  ♦ 

*  ready  to  process  the  target  reguest.  ♦ 

*  * 

*  This  procedure  is  called  by  DH_Recp_msg  ()  .  * 

/♦  Receive  the  request  id  from  the  message.  */ 
perform  DH_RSRid  (source_reg_id)  ; 

/♦  Get  a  pointer  to  the  traf_info  entry  by  the 
source_req_id. V 

ti^ptr  =  DM_TiFind  (source_reg_id)  ; 

/♦  Get  a  pointer  to  the  reg_info  entry  for  the  source 
request.  */ 
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source_re^_info_^tr  =  DM^BiFind  (reg_id,  ti_ptr) ; 

/♦  G€t  a  pointer  to  the  reg_inio  entry  for  the  target 
reguest  by  the  source_reg_inf o_ptr.  ♦/ 
target_ri_ptr  =  £carce_reg_inf o_ptr->next_reg_info ; 

/♦  Get  the  reguest  id  of  the  target  reguest.  */ 
target_reg_id  =  Find_reguest_id  (target_ri_ptr) ; 

/♦  Perform  the  directory  operations  on  the 
target  reguest.*/ 

/♦  Get  the  record  template  for  the  target  reguest.*/ 
tiBpl_ptr  =  get_t®pl_ptr  ( ti_ptr->ti_tl:;id)  ; 

/♦  Get  a  pointer  to  the  attribute  table.  */ 

AT  =  AT_lookuptbl  (ti_ptr->ti_dbid)  ; 

/♦  Perform  the  descriptor  search  processing.  ♦/ 
dene  =  NIMS_SR__DISC  {Srid,  ri_ptr,  tmpt_ptr,  AT); 
if  done 
then 

/♦  Broadcast  the  descriptor  ids  to  the  other 
backends.  ♦/ 

perform  DH_Broadcast_DIDs (Srid)  ; 

end : 

end  procedure  DM  Source  finished: 


APPENDIX  C 

THE  HCDIFIED  BECOSE  PROCESSING  PROGRAM  SPECIFICATIONS 


In  this  part  of  the  appendix,  we  have  added  the 
retrieve-common  subf unction  into  the  control  function  of  the 
physical-data-operaticn  subprocess  of  the  record-processing 
process  (BECP) .  We  have  presented  only  the  modified  portion 
of  the  original  RECP  in  this  appendix. 


procedure  RegProcessing  (input:  MsgType)  ; 

If  Hi^i^i^t**t********  ************  *****^******i^***** 


*  * 

*  Ihis  procedure  is  used  to  process  requests  according  ♦ 

*  to  the  request  type.  ♦ 

*  * 

*  We  add  th a  retrieve-common  request  type  into  the  * 

*  switch  statements  as  one  of  the  optional  cases.  ♦ 

4!  4: 

*  This  procedure  is  called  by  the  procedure  RP_DM.  The  ♦ 

*  original  procedure  is  in  the  reproc.c  file.  ♦ 


*****^i********************i,**^cii**nt*  *******  *************/ 

/*  Get  the  request  type.  */ 
switch  (request_type) 

RETRIEVE_COMaON: 

perform  SI_EetDel(); 

/*  From  this  point,  we  ues  the  same 
procedures  as  used  for  the 
RETRIEVE  request  processing.  ♦/ 

/♦  Now,  back  to  the  original  ReqProcessing  ()  .  */ 
end  procedure  ReqProcessing; 
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procedure  £F_BeadConpleted  ()  ; 

*  This  procedure  is  used  vhen  a  physical  read  is 

*  ccmpleted.  We  add  the  retrieve-comnon  request 

*  type  into  its  switch  statenents  as  one  of  the 

* 

*  the  request  types  cases. 

*  This  procedure  is  called  by  the  procedure  BP_BP. 

*  The  original  procedure  is  in  the  recproc.c  file. 

/♦  Get  the  request  type  of  this  request.  ♦/ 
switch  (request_type) 

BETBIEVE_C0MH01I  : 

perform  liC_Bet  () ; 

BETBIETE: 

perform  EC_Bet(); 

/♦  Now,  tack  to  the  original  processing.  ♦/ 
end  switch; 

end  procedure  BP_fiea dCompIeted ; 

procedure  BBISEND_C01!PLETI0H  (input:  BB_ptr,  reqtype)  ; 

*  This  procedure  does  the  following  tasks: 

*  1.  Send  the  contents  of  the  result  buffer  to 

*  either  the  hashing  nodule  or  the  controller, 

*  depending  on  the  request  type. 

*  2.  If  this  is  a  source  request  of  a  retrieve- 

*  common  request,  then  send  a  message  to  DM 

*  indicating  that  all  of  the  source  records 

*  have  been  retrieved. 

*  3.  Send  a  message  to  CC  to  release  the  locks  on 

*  the  database  for  this  request. 

*  4.  Free  the  result  buffer  space  after  the 

*  contents  of  the  result  buffer  have  been  sent. 


************* 


APPEHDH  C 

THE  HCDIFIEP  BECOBE  PB0CESSIN6  PBOGBiN  SPECIFICATIOHS 

In  this  part  of  the  appendix,  ve  have  added  the 
retrieve-coamon  subfunction  into  the  control  function  of  the 
physical-data-operaticn  subprocess  of  the  record-processing 
process  (2ECP) .  Be  have  presented  only  the  modified  portion 
of  the  original  BECP  in  this  appendix. 


procedure  BegProcessing  (input:  HsgType)  ; 

/««44«**«4i*4i«*«4i*i»:»«**««4i**«***«**«i»4t4t«**4t*4c****4>*#**«*4t 

*  Ihis  procedure  is  used  to  process  requests  according  * 

♦  to  the  request  type.  ♦ 

♦  Be  add  the  retrieve-common  request  type  into  the  * 

*  switch  statements  as  one  of  the  optional  cases.  * 

♦  This  procedure  is  called  by  the  procedure  BP_D(!.  The  • 

*  original  procedure  is  in  the  reproc.c  file.  * 

/*  Get  the  request  type.  */ 
switch  (request_type) 
fiETBIEVE_COMflON: 

perform  SI_BetDel()  ; 

/*  From  this  point,  we  ues  the  same 
procedures  as  used  for  the 
EETBIEVE  request  processing.  ♦/ 

/♦  Now,  back  to  the  original  BegProcessing  ()  .  ♦/ 
end  procedure  BegProcessing; 


♦  All  of  the  data  stcttctores  ans  variables  are  the  * 

♦  saae  as  the  original  procedure.  * 

♦  Ihis  procedure  is  called  by  the  procedure  * 

♦  EC_Het  0  .  * 

♦  The  original  procedure  is  in  the  recproc.c  file.  ♦ 

/*  Get  the  request  id  by  the  result  buffer  pointer 
EB^ptr.  ♦/ 

regu€st_id  =  EB_ptr->BB_rid; 
if  reqtype  =  BETEIEVE_COBHCS 
then 

if  the  result_buff er  is  full 
then 

/*  Send  the  contents  of  the  result  buffer  ♦/ 
/*  to  the  hashing  module  and  reinitialize  */ 
/♦  the  buffer  size  to  0.  ♦/ 

HASH_FT]BC  (reguest^id,  result,  result_length) ; 
result^length  =  0; 
end  if ; 

if  this  is  the  last  result  buffer 
for  this  request 
then 

/*  Send  the  result  buffer  to  the 
hashing  nodule.  */ 

perform  HASH_F0HC  (reguest_id,  result, 

result_length)  ; 
if  this  is  a  source  request 
then 

/♦  Send  a  message  to  DM  indicating  */ 
/♦  that  all  of  the  source  records  */ 
/•  have  been  retrieved.  ♦/ 

perform  DK_FinReg$BP_S (reguest^id) ; 
end  if; 

/♦  Free  the  result  buffer  space.  ♦/ 


perform  Hecp_^£r€e  (request_id)  ; 

/*  Send  a  message  to  CC  to  ♦/ 

/*  release  the  locks  for  this  ♦/ 

/*  request.  ♦/ 

perform  CC_FinReq$EP_S (request^id) ; 
end  if; 
else 

/♦  "his  request  is  not  a  retrieve-common 
request. 

Nov,  back  to  the  original  processing.  */ 

end  if; 

end  procedure  RBSS£Ni;_CONPLETICM; 

procedure  kIBACT  (input:  TRACK_EOFFER,  indexB,  result2, 

request,  tmpl_ptr,  targe t_ptr; 
output:  result2) ; 

♦  This  procedure  extracts  the  attribute  names  and  * 

♦  values  which  correspondend  to  the  target  list  * 

♦  cf  a  record.  ♦ 

♦  This  procedure  is  called  by  the  procedure  * 

♦  $RETR_PEOCESSING()  .  ♦ 

♦  The  original  procedure  is  in  the  rbabs.c  file.  ♦ 

♦  Re  add  an  end-of-record  marker,  EOR,  at  the  end  * 

♦  of  every  record.  ♦ 

*****************************************************/ 

/*  Process  all  statements  of  the  original  procedure 
until  the  end  of  the  outermost  while  loop.  V 
/*  Add  the  following  processing.  */ 
if  the  reqtype  =  RETRIEYE_C01!H0N 
then 

put  the  EORecord  marker  into  the  result  buffer; 
end  if; 

/♦  Now,  back  to  the  original  processing.  */ 


3nd  procedure  X TRACT; 


procedure  RB$POT_SZND  (input ;  BESnLT_BaFFEE^  result, 

length_of_result) ; 


*  This  procedure  puts  the  results  for  a  request  * 

*  into  the  result  buffer.  If  the  result  buffer  is  * 

*  full,  then  the  contents  of  the  buffer  are  sent  to  * 

*  the  controller  or  the  hashing  module  and  the  * 

*  length  of  the  buffer  is  set  to  0.  ♦ 

*  This  procedure  is  called  by  the  procedure  * 

*  fiETB_PROCESSING 0 .  * 

*  The  original  procedure  is  in  the  rbabs. c  file.  * 


if  the  result  buffer  is  full 
then 

/♦  Find  the  request  type  in  the  result  buffer.*/ 
regtype  =  FIND_req_type(result_buffer)  ; 
if  regtype  =  EETRIE7E_C0HM0N 
then 

/*  Send  the  results  to  hashing  module.  */ 
perform  HASH_F0HC  (result_buf  fer)  ; 
else 

/*  Send  the  results  to  the  controller.  */ 
perform  RES$C11TL$RP_S  (request_id, results, 

length_of_result)  ; 

end  if; 

length_of_r€Sult  =  0; 
else 

/♦  Store  the  results  into  the  result  buffer.  ♦/ 
/♦  Now,  back  to  the  original  processing.  */ 
end  if; 

end  procedure  HB$PIJT_£ZND; 


procedure  EP_CNL_ANO!rBEB_BE_HSG ()  ; 

/***«*«««*  4i*4>4i«««4**4t*4t**«**4‘*****4[******4t***  ********* 


*  The  purpose  of  this  procedure  is  to  process  ♦ 

*  the  messages  received  from  the  controller  or  * 

*  the  other  backends.  * 

*  This  procedure  is  modified  for  processing  the  * 

*  the  hashed  information  of  the  non-local  target  * 

*  records.  ♦ 

*  The  original  procedure  is  in  the  reproc. c  file.  * 


*********  *******************************************/ 

/*  Get  the  message  type.  ♦/ 
perform  MsgType  =  Type$HP_B; 
case  HsgType  of 

Bucket^info : 

/♦  This  message  is  the  hashed  information  V 
/♦  for  the  non-local  target  records.  ♦/ 

perform  PBOCESS_BE_TAEGET()  ; 

/♦  This  procedure  should  return  the  sender,*/ 


/♦  the  reguest^id  of  the  target  reguest  ♦/ 

/♦  and  whether  or  not  this  is  the  last  */ 

/*  message  from  this  backend.  */ 

/♦  Check  to  see  if  all  the  target  records  ♦/ 

/*  of  all  the  other  backends  have  been  */ 

/♦  received.  ♦/ 

if  LAST_aSG 
then 

perform  CHECK_BECEI7E_HSG  (sender. 


reguest_id,  ALL_RECEIVED) ; 

end  if ; 

if  ALL^BECEIVED 
then 


perform  STABT_TO_l!ERGE  {reguest_id) ; 

/♦  The  called  routine  will  perform  */ 
/♦  the  merging  operation  and  send  the  V 
/♦  results  to  the  controller.  */ 

end  if; 

/♦  Now,  back  to  the  original  processing.  ♦/ 

end  case; 

end  procedure  BP_CNL_ANOTHEH_BE_MSG; 

procedure  PHOCESS_BE_TARGET (input;  message; 

output:  sender,  reguest_id 
LAST_HECORD)  ; 

/**«**«4<**  «*4i**«***«********4t**«*4i**«4<«**4c*«**** 


♦  This  procedure  is  called  to  process  the  message  * 

♦  which  contains  the  hashed  bucket  information  of  * 

♦  the  non-local  target  records.  * 

♦  This  procedure  will  return  the  sender  of  the  ♦ 

♦  message,  the  request  id  of  those  non-local  ♦ 

♦  records  and  a  boolean  variable,  LAST_RECOSD,  to  ♦ 

♦  indicate  that  all  of  the  target  records  from  the  ♦ 

♦  sending  backend  have  been  received.  * 

♦  ♦ 

♦  Data  structures  and  variables  used  in  this  * 

♦  procedure  are;  * 

♦  1.  LAST_RECCRD:  A  booleein  variable  which  is  ♦ 

♦  used  to  indicate  the  end  of  ♦ 

♦  this  request.  ♦ 

♦  2.  message:  A  character  string  which  is  used  * 

♦  to  store  the  hashed  results  of  * 

♦  target  records  and  is  sent  from  ♦ 

♦  the  other  backends.  ♦ 


***«!»*:»*4i  4^41*41  4c*  4c 4c 4t4c4i4c4i  *4:41  **/ 

/♦  Get  the  sender  of  the  message.  ♦/ 
perform  GET_MSG_SENDER (sender) ; 

/♦  Get  the  request  id  of  the  request.  */ 


perform  GET_BEQOIST_ID(r€quest_id)  ; 

/*  MOV,  check  the  global  table  to  find  the  address  V 
/♦  of  the  hashing  table  for  this  reguest.  ♦/ 
perform  CHECK_G1CEAL_T1B1E  (reguest^id,  hash_table, 

NEB_REQOEST) ; 

NEW^iiECOBD  =  true; 

/*  Since  the  message  is  an  array  of  characters,  */ 

/*  ve  have  to  bypass  the  header  to  get  the  record  ♦/ 
/*  information.  If  this  message  is  the  last  message  */ 


/*  of  the  sending  backend,  then  there  will  be  an  */ 
/♦  end-of- reguest  marker,  lOBeguest,  in  the  front  ♦/ 
/*  of  the  end-of-iessage  marker.  */ 

I  =  the_integer_vhich_stands_for 
_the__index_w  here_recor  d_start ; 

/♦  Gets  the  bucket_numbers  and  their  associated  ♦/ 

/*  records  from  the  message,  then  insert  them  into  */ 
/♦  correct  buckets  of  the  hashing  table.  ♦/ 


while  ((not  end  of  message)  or  (not  end  of  reguest))  do 
perform  GET^BOCKET^MOMBEB  (message,  I,  bucket_value)  ; 
/*  Get  the  bucket  number  of  the  record  and  the  V 
/*  record  itself  from  the  message,  and  then  */ 

/*  store  the  record  into  the  appropriate  bucket  */ 
/*  of  the  hashing  table  by  using  the  */ 

/*  bucket  number.  ♦/ 

perform  GET_A_BECOBD_SET (message, I, set) ; 
perform  STORE_BECOBD_IM_HASH_TABLE (hash^table, 

bucket^number,  set,  MEN_BECOBD)  ; 

NEM.BECOBO  =  false; 
end  while; 
if  EOBeguest 

then  LAST_RECOBD  =  true; 
else  LAST_BECOfiO  =  false; 
end  if; 

end  procedure  PBOCESS_BE_TABGET; 


procedure  STAET_TO_I!E5GE  (input :  reguest_id)  ; 

♦  This  procedure  is  called  when  the  target  record  ♦ 


♦  set  has  been  received  froa  all  of  the  other  * 

♦  backends.  ♦ 

♦  The  input  reguest^id  is  the  reguest  id  of  the  * 

♦  target  reguest.  ♦ 

♦  The  data  structures  and  the  variables  used  in  ♦ 

♦  this  procedure  are:  ♦ 

♦  1.  TAEGET_TAE1E  :  The  hashing  table  for  the  ♦ 

♦  target  reguest.  ♦ 

♦  2.  SOOBCE_TAELE  :  The  hashing  table  for  the  * 

♦  source  reguest.  ♦ 

♦  3.  targ€t_id:  The  reguest  id  of  the  target  ♦ 

♦  reguest.  ♦ 

♦  4.  source_id:  The  reguest  id  of  the  source  ♦ 

♦  reguest.  ♦ 


target^id  =  reguest^id; 

/♦  Get  the  source  reguest  id.  ♦ 

perfcra  GET_SOORCE_ID (target_id,  source_id)  ; 

/*  Get  the  hashing  table  of  the  source  reguest, 
perform  CHECK_GLOEAL_TABLE  (source_id,  global_table 

source_ha  sh_t able , 
»EH_EEQOEST) ; 

/♦  Get  the  hashing  table  of  the  target  reguest.  ♦/ 

perform  CHECK_GL0EAL^TABL2  (target_id,  global_table 

target_hash_table, 
1IEW_RZQJEST)  ; 

/*  Merge  the  records  of  these  two  reguests  and  send  */ 
/*  the  results  to  the  controller.  ♦/ 

perform  MERGE  (sou rce_id,  source_hash_table. address 


target_hash_tatle. address) ; 
end  procedure  STABT_TC_HEP.GE ; 


procedure  GET_SOURCE_ID  (input :  re<iuest_id; 

output :reguest_id) ; 

♦  This  procedure  is  used  to  find  the  request  id  for  * 

♦  the  source  request  by  using  the  request  id  of  the  * 


♦  target  request.  ♦ 

♦  Recall  that  the  source  request  and  the  target 

♦  request  has  the  same  traffic  id,  the  difference  * 

♦  between  them  is  that  the  request  number  of  the  * 

♦  source  request  is  less  than  that  of  target  * 

♦  request  by  1.  * 


***************  *******  *4t4c4t4i4t4c4E***  **********  *********/ 

end  procedure  GET  SODRCE  ID; 


procedure  CHECK_RECEI^E_J!SG  (input:  sender,  reguest_id; 

output:  ALL_HECEIVED) ; 

/  *  «  *  #  *  «  *  «  *  4e  *  *  *  4i  ]|i «  4[  4e  4i  *  4^ «  4c  *  *  *  *  *  4c  *  4<  * 


♦  This  procedure  is  used  to  check  whether  all  * 

♦  of  the  non-local  target  records  have  been  ♦ 

♦  retrieved  froa  all  of  the  other  backends  for  * 

♦  a  particular  rcguest.  If  all  of  the  non-local  ♦ 

♦  target  records  have  been  received,  then  * 

♦  A1L_EECEIVED  is  set  to  true.  Otherwise,  * 

♦  AIL  RECEIVED  is  set  to  false.  ♦ 


***4<4c4i  4c4c4>4c  4i4i  444>4c4!4c4c  4c 4c 4c 4c 4t 4c 4c 4c 4c 4t 4c 4c 4c 4c 4c 4c 4i4i:^ 4c 4c 4c 4c  44c 4c 4141  ^ 

end  procedure  CHECK_EICEI7E_MSG; 


procedure  CHECK^GLOB Al_TABLS (input; reguest_id; 

output:  hash_table, 

NEW_REQOEST) ; 

/4  4«44c44c4c4!44c4c*«*444c44c44c4c44c*4*444c44444#4c4c44c44  4c4  4c4c4*4c44c4c 

♦  This  procedure  is  used  to  ehaali 

♦  is  a  new  reguest  by  checking  if  the  reguest  id  is  ♦ 

♦  in  the  global  table.  If  the  id  is  found,  then  set  * 

♦  the  value  of  NEW_RSQOEST  to  false  and  return  the  ♦ 

♦  NEW_VALDE  and  the  hash_table  of  of  the  reguest.  ♦ 

♦  This  procedure  has  been  defined  in  HASH_FOHC().  ♦ 

44*c>**4c44c44c4«4«44c4c4«4c4c4c4c*4«444c4c4444444c44*44c*4444c44  44  / 

end  procedure  CHECK  GLOBAL  TABLE; 


procedure  GET_BOCKET_KOMBER (input :  message,  index; 

output;  index,  bucket_number) ; 

*  This  procedure  is  used  to  extract  the  bucket  ♦ 

*  numbers  from  the  message,  then  return  the  * 

*  tucket_numb€r  and  the  incremented  index  to  its  * 

*  caller.  * 

*  Eata  structures  and  variables  used  in  this  ♦ 

*  procedure:  ♦ 

*  1.  bucket;  A  character  string  representation  ♦ 

*  of  the  bucket  number.  * 

*  2.  j;  A  general  purpose  index.  ♦ 

***************************************************/ 

j  =  0; 

repeat 

bucket[j]  =  message[  index  ]; 
index  =  index+1; 


j  =  j*i; 

until  messaaefil  =  EOV; 

perform  STHING_TC_INTEGER (tucket ,  bucket_numberj ; 
end  procedure  GET_BOCKET_NOi!BEB; 


procedure  GET_A_HECOHD_SET (input:  message,  I; 

output:  set)  ; 

/*««*****«  «*4>*«4<******4i4i*i|c*«**«**««**«**4i*««4c:^««4i^««4i 

♦  This  procedure  is  used  to  extract  the  common  * 

♦  attribute  value  of  a  record  and  the  record  itself* 

♦  from  the  message  which  contains  the  hashed  bucket* 

♦  information  of  the  non-local  target  records.  ♦ 

♦  * 

♦  The  data  structures  and  the  variables  used  in  * 

*  this  procedure  are:  * 

♦  1.  set:  A  array  which  contains  the  common  ♦ 

*  attribute  value  of  a  record  and  the  ♦ 

♦  record  itself.  * 

*  2.  j:  A  general  purpose  index.  * 

vv*v**«  V  ***  v#***  ««««**««********  *****41  *4>  **«**«/ 

J  =  0; 
repeat 

s€t[J]  =  messag€[I]; 

I  =  1+1; 

<j  =  J+ 1 ; 

until  message£I-1]  =  EOBecord; 
end  procedure  GET_A_SECOBD_SET ; 


APPENDIX  D 

IHE  HASHING  PBOCEDOBE  FBOGSAH  SPECIFICATIONS 

Procedure  HASH_FONCTICN (input:  reguest_id,  result,  length; 

output:  reguest_id,  hashed_result, 
length_hashed_result) ; 


The  purpose  of  this  procedure  is  to  hash  the  value  * 
of  the  join  attribute  into  a  bucket  of  the  hash  ♦ 
table.  ♦ 

A  hash  buffer  is  reserved  to  store  the  hashed  * 

results.  * 

Data  structures  and  variables  used  in  this  * 

procedure  are:  ♦ 

1.  hash^buffer:  A  variable  of  the  data  type  ♦ 

has  hi  ng_  buffer  which  is  used  ♦ 

to  stored  the  records  and  their  * 
hashed  bucket  values,  and  is  * 

defined  in  hashing^module.def .  * 

2.  RP_rid_iEfo:  The  infornation  for  a  request.  ♦ 

This  structure  is  defined  in  ♦ 

the  conadata. def  file.  * 

3.  RP_rid^ptr:  A  pointer  to  the  data  structure  ♦ 

of  type  RP_rid_info.  ♦ 

4.  reg_tbl_ptr:  A  pointer  to  a  request  table,  * 

The  request  table  is  defined  in  * 
the  coamdata.def  file  as  a 
HEQtbl_def inition  structure.  ♦ 

5.  teap_entry:  A  variable  of  data  type  rt_ntry  ♦ 

which  is  defined  in  coamdata.def.  * 

6.  tem_ptr:  A  pointer  to  temp_entry.  * 


♦  7,  rt_enrty:  A  pointer  to  a  field  of  RP_rid_info.* 

♦  The  type  of  this  field  is  rt  ntry.  ♦ 


/*  Check  if  the  request  id  is  a  nev  request.  */ 

if  new  request 
then 

/*  Get  the  record  template  to  find  the  value  ♦/ 
/♦  type  (i.e.,  integer,  string  or  float)  of  the  ♦/ 
/*  cofflfflon  attribute  value.  ♦/ 

perform  PIND_SP_rid_info (reguest_id,RP_rid_ptr) ; 

/♦  Get  a  pointer  to  the  request  table  from  the  ♦/ 
/♦  BP_rid_info.  ♦/ 

req_tbl_ptr  =  RP_rid__ptr  ->  fiP_ri_req; 

Find  the  attribute  name  from 
the  request  table.  ♦/ 
perform  FIN1}_COMMO»_A1TRIBDTE (reg_tbl_ptr, 

attribute_nane) ; 

/♦  Get  a  pointer  to  the  entry  ♦/ 

/♦  of  the  template  for  the  common  attribute.  ♦/ 
tem^ptr  =  RP_rid_ptr  ->  RP_ri_tmpl_ptr  ->  rt_entry; 
/♦  Get  the  value  type  of  the  common  attribute 
/♦  from  the  record  template.  ♦/ 

if  tem_ptr->temp_entry. value_data_type  =  *s' 
then 

value_type  =  string; 
else 


/♦  If  the  value  type  is  integer,  then  ♦/ 

/♦  we  decide  which  hashing  function  to  V 

/*  use.  ♦/ 

MAX  =  tem^ptr. value_c1 ;  /♦  The  possible  ♦/ 

/♦  maximum  value  ♦/ 

/♦  for  this  V 

/♦  attribute.  V 

MIN  =  t€m_ptr. value_c2 ;  /♦  The  possible  ♦/ 

/*  minimum  value 
/♦  for  this 
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/♦  attribute.  ♦ 

if  (MAX-MIN)  <  the__naaber_of_buckets 
then 

value_type  =  snall_integer 
else 

range  =  (flAX-MIN)  /  the_namber_of_fcuckets 
value_type  =  large_integer; 
end  if; 
end  if; 
end  if; 

/*  Allocate  a  buffer  to  store  the  hashed  results.  */ 
perform  ALLOCATE_HASH_BOFFEB (Hash_buff er) ; 

/♦  Note:  we  may  not  want  to  call  this  */ 

/♦  routine  at  this  point.  */ 

switch  (value_tyj6) 
case  string: 

perform  STRING^HASH (result, 

hash_buffer) ; 

case  small.integer: 

perform  SMAL1_INTEGEB_HASH (result ,  MIN 

hash_buffer)  ; 

case  large^integer: 

perform  L1BGI,INTEGEE_HASH (result ,  MIN, 

range, 

hash_buffer)  ; 

end  switch; 

end  procedure  HASH  FUNC: 


procedure  FIND_C0nH01l_ATTRIB0Tl (input:  request  table; 

output:  attribute  name)  ; 

/  «  4  « *  4  «4i  «  4  4 *  4>4i4c <1 « **  *«  « « «*  «  4c*  «  « 

♦  This  procedure  is  used  to  find  the  name  of  the  * 

♦  join  attribute.  * 

♦  The  join  attribute  is  the  first  attribute  of  the  * 

♦  target  list,  sc  we  can  just  go  to  the  entry  ♦ 

♦  where  the  target  list  begins  and  extract  the  first* 

♦  attribute  name  and  then  return  it  to  the  calling  * 

♦  procedure.  * 

4  ********  **4>*  *****  ********************************  **  / 

end  procedure  FIND_CCI!aON_ATTHIBaTE ; 


procedure  1LL0CATE_B0FFEE  (input:  reguest_id; 

output: hash_buffer) ; 

y *************** 44* *4******* **************** **********/ 

/♦  This  procedure  is  used  to  allocate  a  buffer  for  ♦/ 
/♦  storing  the  records  and  their  hashed  bucket  number,*/ 
/♦  set  the  length  of  the  buffer  to  0,  and  then  */ 

/♦  return  the  buffer  to  the  calling  procedure.  ♦/ 

/*  */ 
/♦  The  data  structures  and  the  variables  used  in  */ 
/*  this  procedure  are;  */ 

/*  1,  hash_buffer:  */ 

/*  A  variable  of  the  data  type  hashing^buff er,  ♦/ 

/♦  which  is  defined  in  hashing_module. def  */ 

/♦  (see  Appendix  G) .  ♦/ 

/♦  2.  H3_ptr;  */ 

/♦  A  pointer  to  the  hash_buffer.  */ 

/*  3.  HB  id:  */ 


/* 

A  field  name 

of  the  : 

hash_buffer  that 

♦/ 

/♦ 

contains  the 

reguest 

id  of  the  records 

♦/ 

which  belcng 

to  this 

buffer. 

V 

4i*4r#4i*4i«  4c  4t4i4(  **  «t4t  *4i4t4t  *41*  4i«4i4t4t«4>4<  4c  */ 

HE_ptr  =  allocate  the  hash  buffer; 

HE_ptr->HB_id  =  reguest^id; 

HE_ptr->length  =  0; 
end  procedure  ALLOCA1E_BOFFER; 


procedure  STRING_HASB  (input:  result  buffer,  h_buff€r) ; 

yr  4  4  4  4  *  *  41 4>  4c  4t  4<  4>  4>  4>  4i  4  4  «  4>  4c  4>  4i  4t  41 4c  *  4i  4>  41 4c  4> «  *  4I  *  4c  *  4<  4c  *  4c  4>  *  4c  *  4>  4>  4i  4c  4>  *  «  4 


*  This  procedure  is  called  when  the  value  type  * 

*  of  the  common  attribute  is  a  character  string.  * 

*  It  performs  the  following  tasks:  * 

*  1.  Extract  records  from  the  input  result  buffer  ♦ 

*  one  at  a  time.  ♦ 

*  2.  Extract  the  value  of  the  join  attribute  ♦ 

*  from  the  extracted  record  and  then  check  the  * 

*  lookup  table  to  get  the  bucket  number  for  * 

*  the  record.  ♦ 

*  3.  Store  the  bucket  number  and  the  record  into  * 

*  a  reserved  hash  buffer,  h_baifer.  ♦ 

*  4.  If  the  hash  buffer  is  full,  then  send  the  * 

*  hash  buffer  to  Bucket-block  tracking  ♦ 

*  procedure.  * 

4  4 

*  Data  structures  and  variables  used  in  this  * 

*  procedure  are:  ♦ 

*  1.  attribute_value:  A  character-string  ♦ 

*  representation  of  the  common  * 

*  attribute  value.  ♦ 

*  2.  record:  A  character-string  representation  ♦ 


of  the  extracted  record. 

3.  l}acket_naD]:er:  The  backet  number  where  the 

record  characterized  by  the 
common  attribute  value  is 
hashed  into. 

4.  bucket:  A  character-string  representation 

cf  the  bttcket_namber. 

5.  E07:  The  end-of-value  marker. 

6.  EON:  The  end-of-name  marker. 

7.  EOB:  The  end-of-buf f er  marker. 

8.  1AST_HEC0EE:  A  boolean  variable  to  indicate 

that  this  record  is  the  last 
record  for  the  reguest. 

9.  i:  The  index  for  the  length  of  the  result 

buffer . 

j:  A  general  purpose  index. 

10.  lookup:  The  lookup  table,  which  is  an  array 

with  2048  character-string  elements 


0 

1  abal 

1 

1  abc 

2047  zyth 


11.  h_buffer:  A 

w 


buff  e: 

ppend: 
rds  ai 


J  I  ^  iiip  I  '^iLiraiP 


/*  Get  the  lookup  table.  ♦/ 

i  =  1; 
j  =  0; 

LAST_RECOHD  =  false; 

/♦  Get  records  frco  the  result  buffer  one  at  a  time.  V 
while  result^buf f €r[ i ]  <>  EOB  do 

/*  Bypass  the  name  of  the  common  attribute.  V 

while  result_buffer[ i J  <>  BOH  do 
i  =  i+1; 

end  while;  /♦  Rcw,  result_buff er[ i ]  =  EON.  ♦/ 

i  =  i+1; 

/♦  Get  the  value  of  the  join  attribute.  ♦/ 

While  result_buf ferfi ]  <>  EOV  do 

attribute_value[ j ]  =  result_buff er[ i  ]; 
i  =  i+1; 
j  =  j+1; 


end  while;  /♦  New,  result^buff er[ i  ]  =  EOV.  ♦/ 

/♦  Compare  the  common  attribute  value  with  V 
/♦  the  contents  of  the  lookup  table  to  get  the  */ 
/♦  bucket-number.  ♦/ 


bucket_numbers  =  BI_SEAECH (lookup,  attribute^number) ; 
perform  N0!1BEE_'I0_STBING  (bucket_number,  bucket); 

/*  Add  a  EOV  marker  to  the  end  of 
the  attribute  value.  ♦/ 
attribute_value£  j  ]  =  EOV 
/*  Extract  records  from  the  buffer.  ♦/ 
i  =  i+1; 
j  =  0; 
repeat 

record[j]  =  r€sult_buf  fer[  i  ]; 
i  =  i*1; 
j  = 

until  result_buffer[ i-1 3  =  EOEecord; 

/*  New,  record£j]  =  EOEecord.  ♦/ 
if  result_buf f er[ i 3  =  EOBeguest 
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1AST_REC0HE  =  true; 
i  =  i+1; 
end  if; 

/♦  Store  the  hashed  information  into  the 
hash  buffer,  h_buffer.  */ 
perform  POT_HASH_BOPFEE (h_buff er,  bucket, 

attribute_value,  record, 
IAST_EECORD)  ; 

end  while; 

end  procedure  STRING_HASH; 


procedure  POT_HASH_BUIFEH  (input:  h_buffer, 

buc ket 

attribute_value,  record, 
LAST^RECOSD; 
output:  h^buffer) ; 


This  procedure  is  used  to  store  the  hashed 
record  information  into  the  hash  buffer. 


Bata  structures  and  variables  used  in  this 
procedure  are: 

1.  X,Y,Z,i,j,K:  General  purpose  indexes. 

2.  MAX:  The  predefined  maximum  length  of  the 

hash  buffer. 

3.  bucket:  A  character- string  representation 

of  bucket_number. 

4.  record:  The  input  record  which  is  in  the 

form  of  character  string. 

5.  LAST  RECCED:  A  boolean  variable  which  is 


* 

« 

♦ 

4> 

♦ 

* 

* 

♦ 

* 

* 

* 

* 

* 


♦  used  to  indicate  the  end  of  ♦ 

♦  this  reguest.  * 

♦  6.  h_buffer:  A  buffer  which  is  used  to  store  ♦ 

♦  records  and  their  hashed  values.  * 

/♦  Check  to  see  if  the  buffer  has  enough  space  for  ♦/ 
/♦  the  new  record.  ♦/ 

X  =  String_len  (bucket_nuaber)  ; 

1  =  String_len  (attribute_value)  ; 

2  =  String_len  (record) ; 

K  =  the_current_length_of_the_hash_buf fer; 
if  (K  +  X  ♦  Y  ♦  Z)  >  MAX 
then 

/*  The  buffer  is  full,  so  it  is  send  to  the  ♦/ 

/♦  bucket-block  tracking  procedure.  ♦/ 

perform  BOCKET_BLOCK {h_buff er) ; 

/♦  Reset  the  length  of  the  buffer  to  0.  ♦/ 

K  =  0; 
else 

/♦  The  buffer  has  enough  space,  so  store  the  ♦/ 

/♦  input  record  into  the  buffer.*/ 
for  i  =  1  tc  X  do 
K  =  K  +  1; 

hash_result[  K  ]  =  bucket[i]; 
end  for; 

for  i  =  1  to  Y  do 
K  =  K  ♦  1; 

hash_result£  K]  =  attribute_value[  i  ]; 
end  for; 

for  i  =  1  to  Z  do 
K  =  K  ♦  1; 

hash_result£  K  ]  =  record[i]; 
end  for; 

/*  If  this  is  the  last  record  of  this  request,  */ 
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*/ 

*/ 


/♦  then  send  the  hash_buf£er  to  the 
/♦  bucket_blcck  tracking  procedure, 
if  LAST_RECOED 
then 

hash_r€£ult[  K+ 1  3  =  EOP.eguest; 
hash_r€sult[K+2 3  =  BOB; 
perforn  B OC KE T_B IOC K{h_buffer) ; 
perform  FREE_BOFTER_SPACE (h_tuf fer) ; 
end  if; 
end  if; 
end; 

end  procedure  POT_HASE_BOFFER; 


procedure  SMALL_INTEGEE_HASH  (input:  result_buffer, 

HIM, 

h_buf fer ; 
output : h^buffer) ; 

/**ltit******^i**ii*:t*:t‘iH,m******************************** 


♦  This  procedure  is  used  when  the  type  of  the  ♦ 

♦  common  attribute  value  is  integer  and  when  the  ♦ 

♦  difference  of  the  maximum  and  minimum  value  of  * 

♦  the  common  attribute  value  is  less  than  the  * 

♦  number  of  the  backets  of  the  hashing  table.  * 

♦  It  performs  the  following  tasks:  * 

♦  1.  Extract  records  from  the  input  result  buffer  ♦ 

♦  one  at  a  time.  ♦ 

♦  2.  Extract  the  value  of  the  common  attribute  from* 

♦  the  extracted  record  and  then  calculate  ♦ 

♦  the  bucket  number.  ♦ 

♦  3.  Store  the  bucket  number  and  the  record  into  ♦ 


♦  a  reserved  hash-buffer.  ♦ 

♦  Data  structures  and  variables  used  in  this  ♦ 

♦  procedure  are;  ♦ 

♦  1.  attribute_value:  A  character-strin j  * 

♦  representation  of  the  common  ♦ 

♦  attribute  value.  ♦ 

♦  2.  record:  A  character-string  representation  * 

♦  of  the  extracted  record.  * 

♦  3.  bucket_numter :  The  tucket  number  vhere  the  ♦ 

♦  record  characterized  by  the  * 

♦  common  attribute  value  is  * 

♦  hashed  into.  ♦ 

♦  4.  tucket:  A  character-string  representation  ♦ 

♦  of  the  bucket_number.  * 

♦  5.  E07;  The  end-of-value  marker.  * 

♦  6.  EON:  The  end-of-name  marker.  ♦ 

♦  7.  EOB;  The  end-of-buf fer  marker.  ♦ 

♦  8.  1AST_REC0EE:  A  boolean  variable  to  indicate  ♦ 

♦  that  this  record  is  the  last  ♦ 

♦  record  for  the  reguest.  ♦ 

♦  9.  i:  The  index  for  the  length  of  the  result  * 

♦  buffer.  ♦ 

♦  j:  A  general  purpose  index.  * 

♦  k:  The  index  for  the  length  of  the  attributG_  ♦ 

♦  value.  ♦ 

♦  10.  temp;  An  integer  representation  of  the  input  ♦ 

♦  attribute^value.  * 

♦  11.  h_buffer;  An  variable  of  type  hash_buffer  ♦ 

♦  which  is  defined  in  ♦ 

♦  hashing_module,def  (see  Appendix  G)  ♦ 

♦  and  is  used  to  store  records  and  ♦ 

♦  their  hashed  values.  ♦ 


/*  Initialize  the  indexes.  */ 
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i  =  1; 
k  =  1; 
j  =  0; 

1AST_FEC0HD  =  false; 

/*  Get  the  records  from  the  result  buffer 
one  at  a  time.  ♦/ 
while  result_buf f€r£ i ]  <>  EOB  do 

/♦  Bypass  the  name  of  the  common  attribute.  ♦/ 
while  result_buff er[ i ]  <>  EON  do 
i  =  i+1; 

end  while;  /♦  Now,  result_buf f er[ i j  is  EON.  ♦/ 
i  =  i  +  1; 

/*  Get  the  value  of  the  common  attribute.  ♦/ 
while  result_buff er[ i ]  <>  EOV  do 

attribute_value[ k ]  =  result^buf f er[ i ]; 
i  =  i+1 ; 
j  =  j+1; 

end  while;  /♦  Now,  resalt_buff er[ i  ]  is  EOV.  ♦/ 

/♦  Compute  the  backet  number.  ♦/ 

perform  STRING_10_NUMBER  (attribute_value.  Temp) ; 

bucket_namber  =  Temp  -  MIN; 

perform  NOMBEP_'IO_STEING  {bucket_number,  bucket)  ; 

/*  Add  a  EOV  marker  to  the  end  of  attribute  value.  V 
attribute_value£ j  ]  =  EOV 

/♦  Get  the  attribute-value  pairs  of  the  actual  */ 

/*  target  list  of  the  record.  */ 
i  =  i+1; 
j  =  0; 
repeat 

record£ j  ]  =  result_buf fer£ i  ] ; 
i  =  i+1; 
j  =  1*1; 

until  result_buffer£ i- 1 ]  =  EOEecord; 

/*  Now,  record£j]  is  EOPecord.  ♦/ 


if  result  buffex£i]  =  EOBeguest 


then 

1AST_EEC0EC  =  true; 
i  =  i+1 ; 
end  if; 

/*  Store  the  hashed  information  into  the  h_buffer.  */ 
perform  PUT_HASH_BtJFFEE  {h_buf f  er,  bucket, 

attribute_number,  record, 
LAST_EECORD)  ; 

end  while; 

end  procedure  S!lALl_I!iTESEE_HASH; 


procedure  LAEGE_INTEGEE_HASH (input :  result_baffer, 

dIN,  range, 
h_buf fer ; 

output : hash_buffer)  ; 

/  *  4c  4c  4i  4i  4i  *  It  III  4c  4^  *  *  *  #  4c  *  4c  *  *  4  ifi  t  #  *  *  * 

*  This  procedure  is  used  when  the  type  of  the  * 

*  common  attribute  value  is  integer  and  when  the  * 

*  difference  of  the  maximum  and  minimum  value  of  * 

*  the  common  attribute  value  is  greater  than  the  ♦ 

*  number  of  the  buckets  of  the  hashing  table.  * 

*  It  performs  the  following  tasks:  * 

*  1.  Extract  records  from  the  input  result  buffer  ♦ 

*  one  at  a  time.  * 

*  2.  Extract  the  value  of  the  common  attribute  from* 

*  the  extracted  record  and  then  calculate  * 

*  the  bucket  number.  * 

*  3.  Store  the  tucket  number  and  the  record  into  * 

*  a  reserved  hash-buffer.  * 

*  Data  structures  and  variables  used  in  this  ♦ 

*  procedure  are;  ♦ 

*  1.  attribute_value:  A  character-string  ♦ 

*  representation  of  the  common  * 
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♦  attribute  value.  ♦ 

♦  2.  record:  A  character-string  representation  ♦ 

♦  of  the  extracted  record.  ♦ 

♦  3.  bucket^numter:  The  bucket  number  where  the  ♦ 

♦  record  characterized  by  the  ♦ 

♦  common  attribute  value  is  ♦ 

♦  hashed  into.  * 

♦  4.  tucket:  A  character-string  representation  * 

♦  of  the  bucket_number.  ♦ 

♦  5.  EOV:  The  end-of-value  marker.  ♦ 

♦  6.  SON:  The  end-of-name  marker.  ♦ 

♦  7.  SOB:  The  end-of-buf f er  marker.  ♦ 

♦  e.  LASr_RECORD:  A  boolean  variable  to  indicate  ♦ 

♦  that  this  record  is  the  last  * 

♦  record  for  the  request.  ♦ 

♦  9.  i:  The  index  for  the  length  of  the  result  ♦ 

♦  buffer.  ♦ 

♦  j:  A  general  purpose  index.  ♦ 

♦  k;  The  index  for  the  length  of  the  attribute^  ♦ 

♦  value.  ♦ 

♦  10.  temp:  An  integer  representation  of  the  input  ♦ 

♦  attribute_value.  ♦ 

♦  11.  h_buffer:  An  variable  of  type  hash_buffer  ♦ 

♦  which  is  defined  in  ♦ 

♦  hashing_module. def  (see  Appendix  G)  ♦ 

♦  and  is  used  to  store  records  and  * 

♦  their  hashed  values.  ♦ 


/*  Initialize  the  indexes.  ♦/ 

1=1; 
k  =  1; 
j  =  0; 

1AST_EEC0RD  =  false; 

/♦  Get  records  from  the  result  buffer  one  at  a  time.  ♦/ 


hile  result_buf f€r[ i]  <>  BOB  do 
/♦  Bypass  the  name  of  the  common  attribute.  ♦/ 
while  result_buffer£i ]  <>  BON  do 
i  =  i+1; 

end  while;  /♦  Now,  result_buf fer[ i  ]  is  EON.  ♦/ 
i  =  i+1; 

/*  Get  the  value  of  the  join  attribute.  ♦/ 
while  result_buff er' i ]  <>  EOV  do 

attribute_value[  k  ]  =  result_buf  f  er[  i  ]; 
i  =  i+1; 

:  = 

end  while;  /*  New,  result_buf fer[ i ]  is  EOV.  ♦/ 

/♦  Compute  the  tucket  number.  ♦/ 
perform  STRING_TO_NUI!BEE  (attribute_value.  Temp)  ; 
bucket_value  =  TH0NC[ (Temp  -  MIN) /range]; 
perform  NOMBER_IO_STEING  (bucket_value,  bucket)  ; 

/♦  Add  a  EOV  marker  to  the  end  of  attribute^value.  ♦/ 
attribute_number[ j ]  =  EOV 

/♦  Get  the  attribute-value  pairs  of  the  actual  ♦/ 
/♦  target  list  of  the  record.  ♦/ 
i  =  i+1; 
j  =  0; 
repeat 

record£j]  =  result_buf  f  er[  i  ]; 
i  =  i+1; 
j  =  j*i; 

until  result_buffer[ i-1 ]  =  EORecord; 

/*  New,  record[j]  is  EORecord.  ♦/ 
if  tesult_buf fer[ i]  =  EOReguest 
then 

LAST_RECORE  =  true; 
i  =  i+1; 
end  if; 

/♦  Store  the  hashed  info  nation  into  the  h_buffer.  ♦/ 
perform  ?0T_HASH  BUFFER (h  buffer,  bucket. 


attribute_nambeC/ 


end  Nhile; 


lAST^EECOBD) ; 


end  procedure  LAHGE_1»TEGER_HASH; 


record. 


123 


THE  EOCKET-BLOCK-Tfi ECK1I6  PfiOCBDOBE  PBOGBAH  SPECIFICAIIOHS 
procedure  BUCK ET_B10CK (input:  B_tu£fer) ; 


*  This  procedure  receives  a  hash  buffer,  H_buffer,  ♦ 

*  from  the  ret_ccB  subfunction  and  performs  the  * 

*  fcllowing  task.  ♦ 

*  1.  Establish  and  maintain  a  global  table  to  * 

*  store  the  addresses  of  the  hashing  tables  * 

*  of  all  the  reguests.  ♦ 

*  2.  Extract  the  hashed  record  information  from  ♦ 

*  the  input  hash_buffer.  ♦ 

*  3.  Check  the  global  table  to  see  if  the  input  * 

*  records  belong  to  a  new  reguest.  If  they  do,  ♦ 

*  then  allocate  a  new  hashing  table.  * 

*  Otherwise,  get  the  logical  address  of  the  ♦ 

*  hashing  table  from  the  global  table  and  * 

*  assign  a  pointer  to  the  hashing  table.  ♦ 

*  4.  Group  records  into  the  buckets  according  to  ♦ 

*  their  bucket  numbers  and  store  them  into  * 

*  blocks.  ♦ 

*  5.  Broadcast  the  bucket  information  of  the  local  * 

*  target  records  to  the  other  backends.  ♦ 

*  6.  Store  the  hashing  table  back  to  the  secondary  * 

*  storage.  ♦ 

*  * 

*  Data  structures  and  variables  used  in  this  * 

*  procedure  are:  ♦ 

*  * 

♦  1.  FIBST_RET_COM  :  ♦ 

4> 


A  boolean  variable  which  is  set  to 
true  when  the  first  retrieve  common 


request  enters  the  system. 

2.  GT^ptr; 

A  pointer  to  a  global  table. 

3.  G_table: 

A  variable  of  type  global  table  (see 
Appendix  G) . 

4.  HT_ptr: 

A  pointer  to  a  hashing  table. 

5.  HT: 

A  variable  of  type  Hash_table  (see 
Appendix  G)  . 

6.  HB_ptr: 

A  pointer  to  a  hash  buffer. 

7.  H_buffer: 

A  variable  of  type  hash_buffer  (see 
Appendix  G). 

8.  NSW^EEQUEST: 

A  boolean  variable  which  is  set  to 
true  if  the  request  id  cannot  be  found 
in  the  global  table. 

9.  logical_addr : 

A  variable  of  type  addr_definition, 
which  is  defined  in  the  commdata.def  file 

10.  buck et_n umber : 

The  bucket  number  where  the  record 
characterized  by  the  attribute  value  is 
hashed  into. 

11.  bucket: 

A  character-string  representation  of 
the  bucket_nui&ber. 

12.  reg_id: 


V  '  ■  "  iiiiiiniH'  r  I u-j ” 


*  A  record  which  contains  the  traffic  id  and  * 

«  request  number  of  a  request.  * 

*  13.  i,  j:  ♦ 

*  General  purpose  indexes.  * 

*  *  *  *  4>  *  4>  *  *  *  *  4>  4i  4>  4  4>  *  4<  *  #  4>  ^  *  4>  <1 4>  *  4<  *  *  *  *  4>  *  *  *  4>  4<  4>  *  4>  *  4>  *  4^  *  *  4i  / 

if  FIBSI_fiET_COM 
then 

perform  INITIAlIZE_GLOBAL_TABLE(GT_ptr) ; 
FIBST_EET_COH  =  false; 
end  if; 


/♦  Get  the  request  id  from  the  pointer  of  which  ♦/ 

/*  pcints  the  input  hash  buffer.  */ 

request_id  =  H_buff er.Heguest_id; 

/♦  Check  the  global  table  to  see  if  this  request  is  ♦/ 
/♦  a  new  request.  ♦/ 

perform  CHECK_GLCEAL_TABLE  (GT_ptr,  req_id, 

logical_addr,  ME«_E2Q0ES1)  ; 

if  NET?_BEQOEST 
then 

perform  ALLOCATE_HASH_TABLE (logical_addr) ; 
perform  INSEET_G10BAL_TABLE  {GT_ptr,  req_id, 

logical_addr) ; 

end  if; 

perform  GST_HASHI»G_TABLE {request_id, 

logical_addr,  HT)  ; 

/*  Now,  the  hashing  table  is  ready  to  store  records.  V 


/♦  Extract  the  record  information  from  the  ♦/ 

/♦  hash  buffer  one  record  at  a  time.  V 

/♦  Because  the  last  two  character  of  the  hash  buffer  ♦/ 
/*  are  the  EOBequest  marker  which  indicates  whether  */ 
this  is  the  last  hash  buffer  for  this  request  */ 
/*  and  the  EOBuffer  marker  which  indicates  the  V 

/♦  end  of  this  hash  buffer,  the  actual  length  of  the  ♦/ 
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/♦  hash  buffer  is  length-2.  ♦/ 

j  =  1; 

while  j  <  (H_buffer. length-2)  do 
/*  Get  the  bucket  number.  */ 
i  =  0; 
repeat 

hucket[i]  =  H_buf fer. Hashed_result[ j ]; 
i  =  i  +1; 
j  =  j  ♦  1; 

until  H_buf f er.Hashed_result£ j  ]  =  EOV; 

/*  Convert  the  tucket  number  from  a  character  to  ♦/ 
/*  an  integer.  */ 

kucket_number  =  STHISG_TO_INTSGSR (bucket) ; 

/♦  Get  the  common  attribute  value  and  the  record  ♦/ 
/♦  itself.  ♦/ 

j  =  j  ♦  1; 
i  =  0; 
repeat 

ccmmon_and_record[i ]  =  Hasb^buff er.HB^buf fer[ j ] ; 
i  =  i  ♦  1; 
j  =  j  ♦  1; 

until  common_and_record  [i  -  1)  =  EORecord; 

/*  Store  the  record  and  its  common  attribute  value  V 
/*  into  the  hashing  table.  ♦/ 

perform  STOEE_RECORD_IH_HiSH_TABLS  (HT,  buc ket_numler, 

common_and_r€ccrd, 
NB»_REC0RD)  ; 

NER_R£CORD  =  false; 
end  while; 

/♦  Check  if  this  is  target  reguest  ♦/ 

if  MOD  (reg_id.  reguest^no,  2)  =  0 
then 

/♦  This  is  a  target  reguest.  ♦/ 


perform  BRO AECAST_TABGET_INPO (HT) ; 
end  if; 

perform  STORE_BACK (HT^  logical_addr) 
end  procedure  BOCKET_ELOCK; 


procedure  IIIITIALIZE_GLOBAL_TABIE  (output:  GT_ptr)  ; 

/i^^*4*****  ************************************  ******* 

♦  Ihis  procedure  is  used  when  the  first  retrieve-  ♦ 

♦  common  reguest  is  executed  in  the  BOCKET_3LOCK  ♦ 

♦  procedure.  ♦ 

♦  This  procedure  creates  a  global  table  and  * 

♦  returns  the  pointer  (GT_ptr)  to  the  table  to  * 

♦  the  calling  procedure.  ♦ 

1Hi;t*:^***^***^ti**^t^**^^t*********'^*  *******  ***********/ 

end  procedure  INITIA1IZS_GL0BAI_IABLE; 


procedure  ALLOCATE_HASH_TABLE (output;  logical_addr) ; 

y ********************************************  ******** 

*  This  procedure  is  used  to  allocate  a  hashing  * 

*  table  for  a  new  retrieve-common  request  from  * 

*  a  predefined  secondary  storage  area  and  return  * 

*  the  logical  disk  address  to  the  calling  * 

*  procedure.  ♦ 

*  The  bucket  entries  are  also  initialized.  * 

***************************************************/ 

end  procedure  ALLOCATE  HASH  TABLE; 


procedure  CHECK_GL0BA1_TABLE (i uput:  Gr_ptr,  reguest_id; 

output:  logicdl_addr,  NEW_REQOEST)  ; 

♦  This  procedure  is  used  to  check  whether  a  request  * 

♦  is  a  new  request  by  checking  its  request  id  ♦ 

♦  against  the  global  table.  If  the  request  id  is  ♦ 

♦  found  in  the  global  table,  then  set  the  value  of  ♦ 

♦  NEW_EEQOEST  to  false  and  return  the  logical  disk  * 

♦  address  of  the  hashing  table  to  the  calling  ♦ 

♦  procedure.  Otherwise,  return  the  NEH_REQOEST  ♦ 

♦  tack  to  the  calling  procedure.  ♦ 

*4'«44i4i*4i4>  «**4<«4i4>44i*4i4>4t4i4i4i««4t4c4i4>4t4>*4i«4c4c*4i4i4t4c*4i4>4i*4<4^4i4i4i/ 

end  procedure  CHECK_G10BAL_TABLE; 


procedure  INSERT_GLOEAL_TABLB (input:  GT_ptr,  Reg_id, 

logical_addr ; 
output:  GT_ptr) ; 

/i^ili**^**^******iti*^*i^*iti*mi^****************************'^* 

This  procedure  is  used  to  insert  a  new  hashing  ♦ 
table  into  the  global  table. 


Data  structures  and  variables  used  in  this 
procedure  are; 

1.  GT_ptr: 

A  pointer  to  the  global  table. 

2.  Reg_id: 

The  request  id  of  the  records  of  the  new 
hashing  table. 

3.  logical_addr: 

The  logical  disk  address  of  the  new  hashing 
table. 


*  An  Inverted  list  implementation  to  maintain  the  * 

*  taile  is  reccmoanded.  ♦ 

*««#******«4i***««*«4i«*««****4t*4i*4i4r*4i4t4i:|i4c!»**4t***V«*««:»/ 

end  procedure  IN3EST_GLOBAL_TABIE; 


procedure  3T0RE_F.EC0Er_IN_HA3H_TABLS 

(input:  HT,  buckct^number, 
info,  NEH_RECORD)  ; 

,-tL^:^******iHHti***i^M^****^****************************** 

This  procedure  is  used  to  store  the  common  * 

attribute  value  of  a  record  and  the  record  itself  ♦ 


into  a  hashing  table. 

Recall  that  the  records  are  stored  in  blocks. 


Bata  structures  and  the  variables  used  in  this 
procedure  are: 

1.  HT: 

A  variable  of  type  hash_table  which  is 
defined  in  hashing_module. def  (see  Appendix 
G). 

2.  bucket_nufflber: 

The  bucket  number  where  the  record 
characterized  by  the  common  attribute  value 
is  hashed  into. 

3.  info: 

A  character  string  which  contains  the 
common  attribute  value  of  a  record  and  the 
record  itself. 

4.  NER_RECOED: 

A  boolean  variable  to  indicate  whether  the 


* 

♦ 

♦ 

* 

♦ 

* 

* 

* 

* 

* 

* 

* 

« 

* 

* 

* 

* 

« 


♦  input  info  is  a  new  record  of  this  request  * 

♦  id.  ♦ 

♦  5.  old_bucket_nuaber :  ♦ 

♦  The  bucket_number  of  the  previous  input  ♦ 

♦  record.  ♦ 

♦  6.  bkt:  ♦ 

♦  A  variable  of  type  BUCKET_ENTRY  which  is  ♦ 

♦  defined  in  hashing^module. def  (see  Appendix  ♦ 

♦  G)  .  ♦ 

♦  7.  blk_ptr:  ♦ 

♦  A  pointer  to  a  record  block  of  type  ♦ 

♦  HEC_BIOCK  which  is  defined  in  ♦ 

♦  hashing_nodule. def  (see  Appendix  G)  .  * 

♦  8.  blk,  blk_2:  ♦ 

♦  Variables  of  type  EEC_BLOCK  which  is  defined  ♦ 

♦  hashing^nodule. def  (see  appendix  G)  .  * 

♦  9.  I:  ♦ 

♦  An  integer  variable.  * 

♦  10.  MAX_BLCCK_SIZE:  ♦ 

♦  An  integer  that  represent  the  oaximuiii  * 

♦  length  cf  the  block  content.  ♦ 


if  NEW_RECORD 
then 

/♦  This  record  is  the  first  input  record  of  this  ♦/ 
/♦  request.  ♦/ 

perform  GET_THE_BOCKET  (HT,  bucket_number ,  bkt); 
perform  A1LCCATE_REC_EL0CK (blk)  ; 
perform  MODirY_ENTRY_B_HEADER  (bkt ,  blk)  ; 
else 

/♦  Compare  the  input  bucket_number  with  the 
previous  one.  ♦/ 

if  bucket^number  <>  olQ_bucket_number 
then 
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perfora  STORE_BACK  (blk)  ; 

/♦  Get  the  desired  bucket  entry  for  this 
input  record.  ♦/ 

bkt  =  HI.  bkt_entries[  bucket_nu!nber  ]; 

/*  Check  if  the  tucket  is  empty.  ♦/ 
if  bkt. status  =  empty 
then 

perform  A1L0CATE_EEC_BL0CK (blk ,  addr) ; 
perform  i!ODIFY_ENTRY_S_HEADEE  (bkt, 

blk, addr) ; 

else 

/♦  Get  the  record  block  by  the  address  ♦/ 

/♦  in  the  backet  entry.*/ 

perforin  GET_REC_BLOCK  (bkt .  block_address, 

blk)  ; 

end  if; 
end  if; 

/*  Check  if  the  block  has  enough  space  to  ♦/ 

/*  store  this  record.  ♦/ 

I  =  S7EING_lENGTH(info) ; 

if  (blk. header. length  ♦  I)  >  MAX_BLK_SIZE 
then 

/♦  This  block  does  not  have  enough  space  ♦/ 

/♦  for  this  record.  ♦/ 

perform  ALLOCATE_RECORD_BLOCK  (blk_2 , 

addr_2)  ; 

perform  MODIFY_E!ITRY_6_HEADER  (bkt, 

blk_2, 
addr_2)  ; 

/♦  This  routine  will  also  modify  */ 
/♦  the  header  of  blk_2.  */ 
perform  STORE_BACK  (blk)  ; 
blk  =  blk_2; 
end  if; 


end  if; 


perform  STOR E_INFC_IJI_B LOCK  (info,  blk)  ; 
end  procedure  STORE_EECORD_IN_HASH_TABLE; 


procedure  STORE_BACK  (input:  A_structure)  ; 

/^itil^^^ifi^iti************^/:**********************  ********* 

♦  This  procedure  is  used  to  store  a  hashing  table,  * 

♦  or  a  record  block  back  to  the  secondary  storage.  * 

4  * 

♦  A_structare  is  a  variable  which  may  be  either  ♦ 

♦  a  hashing  table  or  a  block.  ♦ 

***************************************************/ 

end  procedure  STOHE^EACK; 


procedure  GET_REC_BLOCK  (input:  logical^addr; 

output:  blk) ; 

/***************************************************** 

*  This  procedure  is  used  to  bring  a  block  of  memory  ♦ 

♦  from  a  predefined  secondary  storage  area  into  the  * 


♦  primary  memory  by  its  logical  address.  * 

♦  Data  structures  and  variables  used  in  this  * 

♦  procedure  are:  ♦ 

♦  1,  logical_addr  ♦ 

♦  The  logical  address  of  a  block.  ♦ 

♦  A  variable  of  addr_def inition  which  is  ♦ 

♦  defined  in  the  commdata.def  file.  * 

♦  2.  blk  ♦ 


♦  A  variable  of  type  EEC_BLOCK  which  is  defined* 

*  in  the  hashing_module. def  (see  Appendix  G) .  ♦ 

******************  ********************************** y 

end  procedure  GET_EEC_BLOCK; 


procedure  STOBE_IMFO_m_BLOCK  (input;  info,  blk)  ; 


♦  Ihis  procedure  is  used  to  store  the  common  * 

♦  attribute  value  of  a  record  and  the  record  ♦ 

♦  itself  into  a  block.  ♦ 

♦  It  is  called  only  when  the  block  has  enough  * 

♦  space  for  that  information,  i.e. ,  info.  ♦ 

♦  Data  structures  and  variables  used  in  this  ♦ 

♦  procedure  are:  ♦ 

♦  1.  info:  ♦ 

♦  A  character  string  which  contains  the  * 

♦  common  attribute  value  of  a  record  and  * 

♦  the  record  itself.  * 

♦  2.  blk;  ♦ 

♦  A  variable  of  type  BEC^BLOCK  which  is  ♦ 

♦  defined  in  hashing_module.def  (see  * 

♦  Appendix  G) .  ♦ 

♦  3.  i,j;  * 

♦  General  purpose  indexes.  ♦ 


i  =  0; 

j  =  blk. header. length+1 ; 
repeat 

blk.contents[  j  ]  =  info[i]; 
i  =  i+1; 

j  = 

until  i  =  STEING_IENGTH(info) ; 
end  procedure  STORE_INFO_IN_BLCCK; 


procedure  M0DIFY_ENTI!1_B_HEADEE  (input;  fckt,  blk, 

blk_addr ; 

output:  bkt,  blk) ; 


♦  This  procedure  is  used  to  modify  the  bucket  ♦ 

♦  entry  of  the  input  bkt  and  the  header  part  * 

♦  of  the  input  blk.  It  will  then  return  these  * 

♦  modified  bkt  and  blk  back  to  the  calling  * 

♦  jpzoced  ure.  * 

♦  ♦ 

♦  Data  structures  and  variables  used  in  this  * 

♦  procedure;  ♦ 

♦  1.  bkt:  * 

♦  A  variable  of  type  3ucket_entry  * 

♦  which  is  defined  in  hashing_module. def  * 

♦  (see  Appendix  G).  ♦ 

♦  2.  blk:  ♦ 

♦  A  variable  of  type  RSC_3L0CK  which  ♦ 

♦  is  defined  in  hashing_module. def  * 

♦  (see  Appendix  G).  ♦ 

♦  3.  blk_addr  ♦ 

♦  A  variable  of  type  addr_def inition  * 

♦  which  is  the  logical  address  of  a  block  * 

♦  and  is  defined  in  the  commdata.def  file.  * 


^  ********************  */ 

blk. header. next_tlk_addr  =  bkt.  block_address ; 
bkt.klock_address  =  blk_addr; 
end  procedure  MODIFY  ENTRY  &  HEADER; 


procedure  BROADCAST_TARGET_INFO (input:  HT)  ; 

*  This  procedure  is  used  to  broadcast  the  records 

*  of  the  target  hashing  table  to  the  other 

*  backends. 

*  This  is  the  sane  procedure  that  is  used  to 

*  broadcast  the  descriptor  ids  anong  backends. 

*  Data  structures  and  variables  used  in  this 

*  procedure  are: 

*  1.  HT: 

*  A  variable  of  type  hashing_table 

*  which  is  defined  in  hashing_aodale. de£ 

*  (see  Appendix  G) . 

*  2.  i: 

*  A  general  purpose  index. 

*  3.  MAX_BKT_#; 

*  An  integer  which  is  used  to  represent  the 

*  maxiaun  number  of  the  bucket  entries  in  a 

*  hashing  table. 

*  4.  bkt: 

*  A  variable  of  type  Bucket_entry  which 

*  is  defined  in  hashing_nodule. def  (see 

*  Appendix  G). 

*  5.  nsg: 

*  A  character  string  which  is  used  to  store 

*  the  message  that  is  to  be  broadcasted  to  all  * 

*  of  the  backends.  ♦ 

«4ii»«««4i*««*«*4[*4<**4>**«i»*#4i4>*4i4>***4t«1i«***«*««4>***«***yr 

for  i  =  1  to  HAX_EKT_f  do 
bkt  =  HT. bkt_entries£ i 2; 
if  bkt. status  <>  empty 
then 

/*  Put  the  bucket  number  into  the  message.*/ 
perform  GET_,P.SC_B10CK  (bkt.  block_address, blk)  ; 


repeat 

/♦  Extract  the  contents  of  the  ♦/ 

/*  blk. content  and  copy  then  into  msg.*/ 
if  the  asg  is  fall 
then 

send  osg  to  all  of  the  backends; 
reset  the  length  of  msg  to  0; 
end  if; 

if  blk.next_blk_,address  =  blk.own_address 
then 

/*  This  block  is  the  last  block  for 
this  backet.  ♦/ 
last  =  true; 
until  last; 
end  if; 
end  for; 

send  the  msg  to  all  of  the  other  backends; 
end  procedure  BHOADCA£T_TAH6BT_ISFO ; 
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procedure  MEBG2 (input ;  source_request_ii, 

logical^address_of_source_tatle, 
logical_address_of_target_table)  ; 


/4:**********************************^i*1f*^t*1c*^c***:tHct***** 


* 

* 

« 

« 

4> 

* 

* 

* 

* 

* 

♦ 

* 

* 

* 

* 

* 

* 

♦ 

♦ 

« 

♦ 

4> 

♦ 

* 

♦ 

* 

* 

* 

* 

* 

♦ 

* 

* 

* 

♦ 

* 

* 

« 

* 

* 

* 


This  procedure  is  used  to  perform  the  merging 
operation  over  the  source  records  and  the  target 
records. 

Notice  that  the  input  addresses  are  the  logical 
disk  addresses  of  the  two  hashing  tables. 

Data  structures:  and  variables  used  in  this 
procedure  are: 

1 .  logical_address_,of_source_table, 
logical_a  ddress_of _tar  get_tab le : 

The  logical  disk  addresses  of  the  source 


* 

* 

* 

* 

* 

* 

* 

* 

♦ 

* 

* 

* 

* 

* 

* 

♦ 

* 

♦ 

* 

4c 


2. 


and  the  target  hashing  tables,  both  of  the  type* 
address  definition  which  is  defined  in  the  * 

4« 

com m data. def  file.  * 

4> 

source_tatle,  target_table;  * 


Variables  of  hashing^table  data  type  (see 


Appendix  G)  that  represents  the  source-hashing  ♦ 
table  and  the  target-hashing  table.  ♦ 


3. 

4. 


i:  A  general  purpose  index. 
max_bucket_number : 

The  largest  bucket  number  of  a  hashing  table. 


*4i4>4>**4>4>*4>4c4c4<4t4>4i««4c*4i4c4c4>«4c4i«4c4i4ci|i«4c4i4i4c«*4c*4c*4c4c4c4c4c«4>«*4>*/ 


/♦  Retrieve  the  two  hashing  tables  by  the  input  ♦/ 
/♦  logical  addresses,  V 
/*  Note:  Due  to  the  limited  memory  space,  we  may  V 
/♦  not  be  able  to  bring  in  the  entire  table.  ♦/ 
perform  GET_,HASH_TABLE  (logical_address_of_soutce_tabl€. 


source_table)  ; 

perform  GET_HASH_TABLE (logical_adaress_of_target_tatle, 

target^table)  ; 

/♦  Reserve  a  result  buffer.  ♦/ 

perform  gET^BUFTIE {result_taffer,source_reguest_id) ; 

/♦  This  routine  will  allocate  an  instance  of  a 
result  buffer  and  put  the  request  id  into  the 
the  header  cf  the  buffer  and  initialize  the 
length  of  the  buffer  to  0. 

This  routine  has  already  been  coded  in 
the  retp.c  file.  ♦/ 

i  =  0; 

while  i  <  max_bucket_nuab€r  do 

if  [  (source_tatle.  bucket_entry[i  ].  status  <>  empty) 
and 

(target_tatle.  bucket_entry[  i].  status  <>  empty)] 

then 

/*  There  is  a  collision.  ♦/ 

/♦  Retrieve  the  records  from  both  blocks  and 
perform  the  merging  operation.  */ 

X  =  source_table.  bucket_entry[i  ].logical_address 
Y  =  target_tafcle.bucket_entry[i  ].logical_address 
perform  merging_operation  (X,Y,result_, buffer)  ; 

/♦  This  routine  will  perform  the  merging 
operation  and  send  the  merged  results 
to  the  controller.  ♦/ 

end  if; 
i  =  i*1; 
end  while; 

/♦  Signal  PP  upon  the  completion  of  the  source  and  ♦/ 
/♦  target  request.  */ 
end  procedure  MERGE; 


procedure  HERGING.OPiBATION 

(input:  logicl_address_source_block, 
logicl_address__target_block  , 
result_buf  fer ; 
output:  result_buffer) ; 

/t*******************************************^ ******** 

*  Ihis  procedure  is  used  to  perfora  the  following 

*  tasks: 

*  1.  Extract  the  records  froa  both  of  the  source 

*  block  and  the  target  block. 

*  2.  Coapare  the  coaaon  attribute  values 

*  of  the  source  and  target  records. 

*  If  they  are  egual«  then  perfora  the  aerging 

*  operation. 

*  3.  Put  the  merged  results  into  a  result  buffer. 

*  If  the  buffer  is  full,  then  send  the  buffer 

*  to  the  controller  and  reinitialize  the 

*  buffer  length  to  0  so  that  the  buffer  can 

*  be  reused.  * 

*  Otherwise,  return  the  logical  address  of  the  * 

*  the  result  buffer  to  the  calling  procedure.  ♦ 


Data  structures  and  variables  used  in  this 
procedure  are: 

1.  source_tlock,  target_block: 

Variables  of  the  data  type  BKT_BLK  which 
are  used  to  represent  the  blocks  of  the 
source  bashing  table  or  the  target  hashing 
table. 

3KT_BLK  is  defined  in  hashing_module. def 
(see  Appendix  G) . 

2.  source_dcne,  target^done: 

Boolean  variables  which  are  used  to  indicate 
the  coapletion  of  processing  either  source 


♦  records  or  target  records.  ♦ 

*  3.  ifj:  General  purpose  indexes.  * 

/♦  Continue  retrieving  the  source  blocks  by  the  ♦/ 
/*  logical  address^  until  there  are  no  nore  blocks. 
repeat 

soarce_block  = 

GET_BLOCK (logical_address_source_block) ; 

/♦  Continue  retrieving  the  target  blocks  by  the  ♦/ 
/*  logical  address  until  there  are  no  aore  blocks.*/ 
repeat 

target_block  = 

GET_BLOCK (logical_address_targe t_block) ; 

i  =  0; 

while  source_fclock.  body£i  ]  <>  EOB  do 

/*  Retrieve  one  common  attribute_value  and  one  */ 
/♦  record  from  source  block.  ♦/ 

source.value  =  GBT_VaLOE(source_block. body,i  ); 
source_record  *  GET^BECOED  (source_block. body»i)  ; 

J  =  0; 

while  target_block.bcdy[ j ]  <>  EOB  do 

/♦  Retrieve  one  common  attribute_value  and  ♦/ 
/*  one  record  from  the  target  block.  ♦/ 

target_value  =  GET_VAL0E  (target_block. body,  j) ; 
target^record  = 

GET_RECOED (target_b lock. body, j)  ; 
if  source^value  =  target_value 
then 

/*  Append  target  record  at  the  end  of  ♦/ 
/*  source  record  and  put  the  newly  ♦/ 
/*  merged  record  into  the  result  buffer.*/ 
result  =  APPEND (source_record, 

target_record)  ; 

result_length  =  STHING_LENGTH (result) ; 
perform  BBSPOT  SEND (result_buff er. 


result, 

result_length) ; 

else 

/♦  Go  to  the  next  target  record.  ♦/ 

J  =  J+1; 
end  if; 

end  while;  /♦  End  the  target-record  loop.  ♦/ 
i  =  1+1; 

end  while;  /♦  End  the  source-record  loop.*/ 

/*  Are  the  target  records  done?  ♦/ 
if  target_block. header. next_bloc)c_address  = 
target_blcck. header. this_block_address 
then 

target_dcne  =  true; 
else 

target^tlock. header. next_block_address  = 
targe t_block. header. this_block_ad dress; 

end  if ; 

until  target^dcre; 

/*  Are  the  source  records  done?  */ 
if  source^block. header. next_block_address  = 
sourc e_block. header .this_block_ad dress 
then 

source_dOL€  =  true; 
else 

source_blcck. header. next_block_addres3  = 
source_block. header .this_block_adiress; 
end  if; 

until  soarce_done; 
end  procedure  t!EBGING_OPEBATION; 
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In  this  appendix  ne  present  the  definitions  of  the  data 
structures  used  in  the  previous  appendices.  We  refer  to  the 
definitions  as  hashing_module.  def . 

1.  hash_huffer; 

This  is  the  buffer  which  stores  the  hashed  inforoaticn 
of  records. 

— >  The  request  id  of 

the  hashed  records. 

— >  The  current  length 

of  the  Hash ed_ results. 
— >  Kn  array  of  character 
string  used  for 
storing  the  hashed 
records. 

The  format  of  tie  hashed_results  is: 

{hashed_record_inf 0} ♦  BOReq  BOB 
where 

hashed_record_info  ; :  =  bucket_number  B07  {Rec} ♦ 

Hec  ::  =  {attribate_value_pair} ♦EOEec 
attribute_valu€^pair  = 

attribute^name  BON  attribute_value  BOV 
»*»  means  one  cr  more  occurence. 

BOB  :  A  special  character  which  is  used  as  a  marker 
for  the  end-of- buffer. 

BCV  :  A  special  character  which  is  used  as  a  marker 
for  the  end-of-value. 

BCN  :  A  special  character  which  is  used  as  a  marker 
for  the  €nd-of-attribute_name. 

BORec:  A  special  character  which  is  used  as  a  marker 


8equest_ia 


length 


Hashed  results 


9 


which  is 


for  the  end-of-record. 

ECBeq:  \  character,  either  1  or  0  . 

use  to  indicate  the  end  of  a  request. 

1:  end  cf  a  request. 

0:  not  end  of  request,  oore  buffers  are  coning 


2.  REC^EICCK 

Blocks  used  by  buckets  to  store  the  records  and  their 
connon  attribute  values. 

A  REC_E10CR  is  coaposed  of  a  header  two  fields, 
and  a  contents. 

— >  This  part  contains  the  status 
of  this  block. 

— >  This  part  contains  the  records 

and  their  coanon  attribute  values. 


header 


contents 


The  format  of  the  content  of  the  REC^BLOCK  is: 

{R  ec}  +E0B 

The  header  contains  two  parts: 

— >  An  integer  to  indicate  the  total 
length  of  the  records  in  this 
block. 

— >  The  logical  address  of  the  next 
block  of  the  same  bucket.  (If 
this  block  is  the  first  block  of 
the  bucket,  then  a  null  aidress 
will  be  put  in  here.) 

The  type  of  this  field  is 
address_def inition  and  is 
defined  in  the  commdata.def  file. 


length 


next  fclk  addr 


14h 


M.  .V  ir 


3.  flucket_entry: 

— >  A  character  which  is  either  1  fcr 
not  empty  or  a  0  for  empty  . 
— >  The  logical  address  of  the  block 
of  this  bucket. 


4.  Hash_tabie;  An  array  of  2048  bucket_entries. 


status 


block  address 


145 


LIST  OF  BEFZBEHCES 


Date,  C.  J.,  An  Introduction  To  Database  System  Volume 
1,  Addison  Wesl€y7~TgF27 - - 

Lowenthal,  E.  I..  "The  BacJcend  Computer,  Part  I  and 
Part  II,"  Auerbach  (Data  Manaaement)  Serious-  24-01-04 
and  24-01-03:; — IT7B - - - 

Haryanski,  F.  J.,  "Backend  Database  System",  Comcutina 
Surveys,  Vol.  12,  No.  1,  pp.3-25,  March  1980. 

Naval  Postgraduate  School  Technical  Report 
NPS52-83-006,  Design  and  Analysis  of  a  Multi-Backend 


The  Ohio  State 
OSU-CISRC-TR-8  2-1, 


University  Technical  Report 
The  Implementation  of  a 


rTB2T 


Naval  Postgraduate  School  Technical  Report 
NPS52-85-002,  A  Multi -Back end  Database  System  for 
Performance  Gains.  Ca racIF y  grow^3^ and  "^ardward 
J^gradeT  by  S.  A.  Demurjran  and~oThers,  Teburary 

Muldur,  S.,  Design  and  Analysis  of  Ordering  and  Join 


auiaur,  s..  Design  and  Analysis  of  ordering  and  Join 
Operations  fo£  a  Bulti-bi^clcend  Uatabase  3vstei~l!Taster 
Thesis,  Naval  PostgraduaTe  3c3ooI,  "Hdnterey, 
California,  June  1984. 

The  Ohio  State  Jniversity  Technical  Report 
OSO-CISRC-TR-8 1-1 1,  A  Survey  of  Parallel  Sorting 
Algorithms,  by  D.  K.  Hsiao  and  otHers,  becelEer  1981 . 


Ohio 


State  University 


Technical  Report 


OSO-ClSRC-TE-80-7,  Parallel  Record-Sorting  Methods  for 
Hardware^gealization.  oy~D;  fr.“TrsIao  andTf.~d7  dlnoIiT 

Naval  Postgraduate  School  Technical  Report 
NPS52-82-008,  The  Implementation  of  a  Multi-Backend 


He  and  the  otheis,  July  1982 


Johnson,  S.  C.  -  ”YACC;  Yet  Another  Complier-Complier'' , 
aUIX,  TIME-SHSBIUG  SYSTEM:  JHIX  PROGRAMMER'S  MANUAL, 
Eell  TeTep^oiie  ITEoraTof les,  rncorporaEecT,”  'Murray 
Hill,  N.J.,  1982. 


Naval  Postgraduate  School  Technical  Report 
NES52-84-005,  The  Implementation  of  a  Multi-Backend 
Database  ^stea  faBUST :  17  -  TEe  BevrseB 

Concurrency  C'o'ntrol  an3  Direcrsfy  Manaaeme5T~Proc€Sses 
an'd  TBe  ~«evisel  DeFInitions  of  in  ter- 'pro cess  and 
Fn^er-computef  Hessaoes  “By  S’.  717  IJelurgian  and  F5e 

oFEers,  'PeBruary  1984. 


IHITIAL  DISIBIBOTIOH  LIST 


No.  Copies 


Defense  Technical  Information  Center  2 

Cameron  Station 

Alexandria,  Virginia  22304-6145 

lilrary.  Code  0142  2 

Naval  postgraduate  School 
Monterey,  California  93943-5100 

Chairman,  Code  052  2 

Department  of  Cciputer  Science 
Naval  Postgraduate  School 
Monterey,  California  93943-5100 

Curriculum  Officer,  Code  037  2 

Computer  Technology  Program 
Naval  Postgraduate  School 
Monterey,  California  93943-5100 

Professor  David  K.  Hsiao,  Code  052  2 

Department  of  Computer  Science 
Naval  Postgraduate  School 
Monterey,  California  93943-5100 

Steven  A,  Demurjian,  Code  052  2 

Department  of  Computer  Science 
Naval  Postgraduate  School 
Monterey,  California  93943-5100 

Rsiang-lung  Tung  2 

8-  lane  46,  Ming-Chuan  Road 
Cnia-Ii  City,  Taiwan  600 
Republic  of  China 


